diff options
Diffstat (limited to 'src/algebra/float.spad.pamphlet')
-rw-r--r-- | src/algebra/float.spad.pamphlet | 1064 |
1 files changed, 1064 insertions, 0 deletions
diff --git a/src/algebra/float.spad.pamphlet b/src/algebra/float.spad.pamphlet new file mode 100644 index 00000000..3f5753f7 --- /dev/null +++ b/src/algebra/float.spad.pamphlet @@ -0,0 +1,1064 @@ +\documentclass{article} +\usepackage{axiom} +\begin{document} +\title{\$SPAD/src/algebra float.spad} +\author{Michael Monagan} +\maketitle +\begin{abstract} +\end{abstract} +\eject +\tableofcontents +\eject +\section{domain FLOAT Float} +As reported in bug number 4733 (rounding of negative numbers) +errors were observed in operations such as +\begin{verbatim} + -> round(-3.9) + -> truncate(-3.9) +\end{verbatim} +The problem is the unexpected behaviour of the shift +with negative integer arguments. +\begin{verbatim} + -> shift(-7,-1) +\end{verbatim} +returns -4 while the code here in float expects the +value to be -3. shift uses the lisp function ASH +'arithmetic shift left' but the spad code expects +an unsigned 'logical' shift. See +\begin{verbatim} + http://www.lispworks.com/reference/HyperSpec/Body/f_ash.htm#ash +\end{verbatim} +A new internal function shift2 is defined in terms of +shift to compensate for the use of ASH and provide the +required function. + +It is currently unknown whether the unexpected behaviour +of shift for negative arguments will cause bugs in other +parts of Axiom. +<<domain FLOAT Float>>= +)abbrev domain FLOAT Float + +B ==> Boolean +I ==> Integer +S ==> String +PI ==> PositiveInteger +RN ==> Fraction Integer +SF ==> DoubleFloat +N ==> NonNegativeInteger + +++ Author: Michael Monagan +++ Date Created: +++ December 1987 +++ Change History: +++ 19 Jun 1990 +++ Basic Operations: outputFloating, outputFixed, outputGeneral, outputSpacing, +++ atan, convert, exp1, log2, log10, normalize, rationalApproximation, +++ relerror, shift, / , ** +++ Keywords: float, floating point, number +++ Description: \spadtype{Float} implements arbitrary precision floating +++ point arithmetic. +++ The number of significant digits of each operation can be set +++ to an arbitrary value (the default is 20 decimal digits). +++ The operation \spad{float(mantissa,exponent,\spadfunFrom{base}{FloatingPointSystem})} for integer +++ \spad{mantissa}, \spad{exponent} specifies the number +++ \spad{mantissa * \spadfunFrom{base}{FloatingPointSystem} ** exponent} +++ The underlying representation for floats is binary +++ not decimal. The implications of this are described below. +++ +++ The model adopted is that arithmetic operations are rounded to +++ to nearest unit in the last place, that is, accurate to within +++ \spad{2**(-\spadfunFrom{bits}{FloatingPointSystem})}. +++ Also, the elementary functions and constants are +++ accurate to one unit in the last place. +++ A float is represented as a record of two integers, the mantissa +++ and the exponent. The \spadfunFrom{base}{FloatingPointSystem} +++ of the representation is binary, hence +++ a \spad{Record(m:mantissa,e:exponent)} represents the number \spad{m * 2 ** e}. +++ Though it is not assumed that the underlying integers are represented +++ with a binary \spadfunFrom{base}{FloatingPointSystem}, +++ the code will be most efficient when this is the +++ the case (this is true in most implementations of Lisp). +++ The decision to choose the \spadfunFrom{base}{FloatingPointSystem} to be +++ binary has some unfortunate +++ consequences. First, decimal numbers like 0.3 cannot be represented +++ exactly. Second, there is a further loss of accuracy during +++ conversion to decimal for output. To compensate for this, if d +++ digits of precision are specified, \spad{1 + ceiling(log2 d)} bits are used. +++ Two numbers that are displayed identically may therefore be +++ not equal. On the other hand, a significant efficiency loss would +++ be incurred if we chose to use a decimal \spadfunFrom{base}{FloatingPointSystem} when the underlying +++ integer base is binary. +++ +++ Algorithms used: +++ For the elementary functions, the general approach is to apply +++ identities so that the taylor series can be used, and, so +++ that it will converge within \spad{O( sqrt n )} steps. For example, +++ using the identity \spad{exp(x) = exp(x/2)**2}, we can compute +++ \spad{exp(1/3)} to n digits of precision as follows. We have +++ \spad{exp(1/3) = exp(2 ** (-sqrt s) / 3) ** (2 ** sqrt s)}. +++ The taylor series will converge in less than sqrt n steps and the +++ exponentiation requires sqrt n multiplications for a total of +++ \spad{2 sqrt n} multiplications. Assuming integer multiplication costs +++ \spad{O( n**2 )} the overall running time is \spad{O( sqrt(n) n**2 )}. +++ This approach is the best known approach for precisions up to +++ about 10,000 digits at which point the methods of Brent +++ which are \spad{O( log(n) n**2 )} become competitive. Note also that +++ summing the terms of the taylor series for the elementary +++ functions is done using integer operations. This avoids the +++ overhead of floating point operations and results in efficient +++ code at low precisions. This implementation makes no attempt +++ to reuse storage, relying on the underlying system to do +++ \spadgloss{garbage collection}. I estimate that the efficiency of this +++ package at low precisions could be improved by a factor of 2 +++ if in-place operations were available. +++ +++ Running times: in the following, n is the number of bits of precision +++ \spad{*}, \spad{/}, \spad{sqrt}, \spad{pi}, \spad{exp1}, \spad{log2}, \spad{log10}: \spad{ O( n**2 )} +++ \spad{exp}, \spad{log}, \spad{sin}, \spad{atan}: \spad{ O( sqrt(n) n**2 )} +++ The other elementary functions are coded in terms of the ones above. + + +Float(): + Join(FloatingPointSystem, DifferentialRing, ConvertibleTo String, OpenMath,_ + CoercibleTo DoubleFloat, TranscendentalFunctionCategory, ConvertibleTo InputForm) with + _/ : (%, I) -> % + ++ x / i computes the division from x by an integer i. + _*_*: (%, %) -> % + ++ x ** y computes \spad{exp(y log x)} where \spad{x >= 0}. + normalize: % -> % + ++ normalize(x) normalizes x at current precision. + relerror : (%, %) -> I + ++ relerror(x,y) computes the absolute value of \spad{x - y} divided by + ++ y, when \spad{y \^= 0}. + shift: (%, I) -> % + ++ shift(x,n) adds n to the exponent of float x. + rationalApproximation: (%, N) -> RN + ++ rationalApproximation(f, n) computes a rational approximation + ++ r to f with relative error \spad{< 10**(-n)}. + rationalApproximation: (%, N, N) -> RN + ++ rationalApproximation(f, n, b) computes a rational + ++ approximation r to f with relative error \spad{< b**(-n)}, that is + ++ \spad{|(r-f)/f| < b**(-n)}. + log2 : () -> % + ++ log2() returns \spad{ln 2}, i.e. \spad{0.6931471805...}. + log10: () -> % + ++ log10() returns \spad{ln 10}: \spad{2.3025809299...}. + exp1 : () -> % + ++ exp1() returns exp 1: \spad{2.7182818284...}. + atan : (%,%) -> % + ++ atan(x,y) computes the arc tangent from x with phase y. + log2 : % -> % + ++ log2(x) computes the logarithm for x to base 2. + log10: % -> % + ++ log10(x) computes the logarithm for x to base 10. + convert: SF -> % + ++ convert(x) converts a \spadtype{DoubleFloat} x to a \spadtype{Float}. + outputFloating: () -> Void + ++ outputFloating() sets the output mode to floating (scientific) notation, i.e. + ++ \spad{mantissa * 10 exponent} is displayed as \spad{0.mantissa E exponent}. + outputFloating: N -> Void + ++ outputFloating(n) sets the output mode to floating (scientific) notation + ++ with n significant digits displayed after the decimal point. + outputFixed: () -> Void + ++ outputFixed() sets the output mode to fixed point notation; + ++ the output will contain a decimal point. + outputFixed: N -> Void + ++ outputFixed(n) sets the output mode to fixed point notation, + ++ with n digits displayed after the decimal point. + outputGeneral: () -> Void + ++ outputGeneral() sets the output mode (default mode) to general + ++ notation; numbers will be displayed in either fixed or floating + ++ (scientific) notation depending on the magnitude. + outputGeneral: N -> Void + ++ outputGeneral(n) sets the output mode to general notation + ++ with n significant digits displayed. + outputSpacing: N -> Void + ++ outputSpacing(n) inserts a space after n (default 10) digits on output; + ++ outputSpacing(0) means no spaces are inserted. + arbitraryPrecision + arbitraryExponent + == add + BASE ==> 2 + BITS:Reference(PI) := ref 68 -- 20 digits + LENGTH ==> INTEGER_-LENGTH$Lisp + ISQRT ==> approxSqrt$IntegerRoots(I) + Rep := Record( mantissa:I, exponent:I ) + StoredConstant ==> Record( precision:PI, value:% ) + UCA ==> Record( unit:%, coef:%, associate:% ) + inc ==> increasePrecision + dec ==> decreasePrecision + + -- local utility operations + shift2 : (I,I) -> I -- WSP: fix bug in shift + times : (%,%) -> % -- multiply x and y with no rounding + itimes: (I,%) -> % -- multiply by a small integer + chop: (%,PI) -> % -- chop x at p bits of precision + dvide: (%,%) -> % -- divide x by y with no rounding + square: (%,I) -> % -- repeated squaring with chopping + power: (%,I) -> % -- x ** n with chopping + plus: (%,%) -> % -- addition with no rounding + sub: (%,%) -> % -- subtraction with no rounding + negate: % -> % -- negation with no rounding + ceillog10base2: PI -> PI -- rational approximation + floorln2: PI -> PI -- rational approximation + + atanSeries: % -> % -- atan(x) by taylor series |x| < 1/2 + atanInverse: I -> % -- atan(1/n) for n an integer > 1 + expInverse: I -> % -- exp(1/n) for n an integer + expSeries: % -> % -- exp(x) by taylor series |x| < 1/2 + logSeries: % -> % -- log(x) by taylor series 1/2 < x < 2 + sinSeries: % -> % -- sin(x) by taylor series |x| < 1/2 + cosSeries: % -> % -- cos(x) by taylor series |x| < 1/2 + piRamanujan: () -> % -- pi using Ramanujans series + + writeOMFloat(dev: OpenMathDevice, x: %): Void == + OMputApp(dev) + OMputSymbol(dev, "bigfloat1", "bigfloat") + OMputInteger(dev, mantissa x) + OMputInteger(dev, 2) + OMputInteger(dev, exponent x) + OMputEndApp(dev) + + OMwrite(x: %): String == + s: String := "" + sp := OM_-STRINGTOSTRINGPTR(s)$Lisp + dev: OpenMathDevice := OMopenString(sp pretend String, OMencodingXML) + OMputObject(dev) + writeOMFloat(dev, x) + OMputEndObject(dev) + OMclose(dev) + s := OM_-STRINGPTRTOSTRING(sp)$Lisp pretend String + s + + OMwrite(x: %, wholeObj: Boolean): String == + s: String := "" + sp := OM_-STRINGTOSTRINGPTR(s)$Lisp + dev: OpenMathDevice := OMopenString(sp pretend String, OMencodingXML) + if wholeObj then + OMputObject(dev) + writeOMFloat(dev, x) + if wholeObj then + OMputEndObject(dev) + OMclose(dev) + s := OM_-STRINGPTRTOSTRING(sp)$Lisp pretend String + s + + OMwrite(dev: OpenMathDevice, x: %): Void == + OMputObject(dev) + writeOMFloat(dev, x) + OMputEndObject(dev) + + OMwrite(dev: OpenMathDevice, x: %, wholeObj: Boolean): Void == + if wholeObj then + OMputObject(dev) + writeOMFloat(dev, x) + if wholeObj then + OMputEndObject(dev) + + shift2(x,y) == sign(x)*shift(sign(x)*x,y) + + asin x == + zero? x => 0 + negative? x => -asin(-x) +-- one? x => pi()/2 + (x = 1) => pi()/2 + x > 1 => error "asin: argument > 1 in magnitude" + inc 5; r := atan(x/sqrt(sub(1,times(x,x)))); dec 5 + normalize r + + acos x == + zero? x => pi()/2 + negative? x => (inc 3; r := pi()-acos(-x); dec 3; normalize r) +-- one? x => 0 + (x = 1) => 0 + x > 1 => error "acos: argument > 1 in magnitude" + inc 5; r := atan(sqrt(sub(1,times(x,x)))/x); dec 5 + normalize r + + atan(x,y) == + x = 0 => + y > 0 => pi()/2 + y < 0 => -pi()/2 + 0 + -- Only count on first quadrant being on principal branch. + theta := atan abs(y/x) + if x < 0 then theta := pi() - theta + if y < 0 then theta := - theta + theta + + atan x == + zero? x => 0 + negative? x => -atan(-x) + if x > 1 then + inc 4 + r := if zero? fractionPart x and x < [bits(),0] then atanInverse wholePart x + else atan(1/x) + r := pi/2 - r + dec 4 + return normalize r + -- make |x| < O( 2**(-sqrt p) ) < 1/2 to speed series convergence + -- by using the formula atan(x) = 2*atan(x/(1+sqrt(1+x**2))) + k := ISQRT (bits()-100)::I quo 5 + k := max(0,2 + k + order x) + inc(2*k) + for i in 1..k repeat x := x/(1+sqrt(1+x*x)) + t := atanSeries x + dec(2*k) + t := shift(t,k) + normalize t + + atanSeries x == + -- atan(x) = x (1 - x**2/3 + x**4/5 - x**6/7 + ...) |x| < 1 + p := bits() + LENGTH bits() + 2 + s:I := d:I := shift(1,p) + y := times(x,x) + t := m := - shift2(y.mantissa,y.exponent+p) + for i in 3.. by 2 while t ^= 0 repeat + s := s + t quo i + t := (m * t) quo d + x * [s,-p] + + atanInverse n == + -- compute atan(1/n) for an integer n > 1 + -- atan n = 1/n - 1/n**3/3 + 1/n**5/4 - ... + -- pi = 16 atan(1/5) - 4 atan(1/239) + n2 := -n*n + e:I := bits() + LENGTH bits() + LENGTH n + 1 + s:I := shift(1,e) quo n + t:I := s quo n2 + for k in 3.. by 2 while t ^= 0 repeat + s := s + t quo k + t := t quo n2 + normalize [s,-e] + + sin x == + s := sign x; x := abs x; p := bits(); inc 4 + if x > [6,0] then (inc p; x := 2*pi*fractionPart(x/pi/2); bits p) + if x > [3,0] then (inc p; s := -s; x := x - pi; bits p) + if x > [3,-1] then (inc p; x := pi - x; dec p) + -- make |x| < O( 2**(-sqrt p) ) < 1/2 to speed series convergence + -- by using the formula sin(3*x/3) = 3 sin(x/3) - 4 sin(x/3)**3 + -- the running time is O( sqrt p M(p) ) assuming |x| < 1 + k := ISQRT (bits()-100)::I quo 4 + k := max(0,2 + k + order x) + if k > 0 then (inc k; x := x / 3**k::N) + r := sinSeries x + for i in 1..k repeat r := itimes(3,r)-shift(r**3,2) + bits p + s * r + + sinSeries x == + -- sin(x) = x (1 - x**2/3! + x**4/5! - x**6/7! + ... |x| < 1/2 + p := bits() + LENGTH bits() + 2 + y := times(x,x) + s:I := d:I := shift(1,p) + m:I := - shift2(y.mantissa,y.exponent+p) + t:I := m quo 6 + for i in 4.. by 2 while t ^= 0 repeat + s := s + t + t := (m * t) quo (i*(i+1)) + t := t quo d + x * [s,-p] + + cos x == + s:I := 1; x := abs x; p := bits(); inc 4 + if x > [6,0] then (inc p; x := 2*pi*fractionPart(x/pi/2); dec p) + if x > [3,0] then (inc p; s := -s; x := x-pi; dec p) + if x > [1,0] then + -- take care of the accuracy problem near pi/2 + inc p; x := pi/2-x; bits p; x := normalize x + return (s * sin x) + -- make |x| < O( 2**(-sqrt p) ) < 1/2 to speed series convergence + -- by using the formula cos(2*x/2) = 2 cos(x/2)**2 - 1 + -- the running time is O( sqrt p M(p) ) assuming |x| < 1 + k := ISQRT (bits()-100)::I quo 3 + k := max(0,2 + k + order x) + -- need to increase precision by more than k, otherwise recursion + -- causes loss of accuracy. + -- Michael Monagan suggests adding a factor of log(k) + if k > 0 then (inc(k+length(k)**2); x := shift(x,-k)) + r := cosSeries x + for i in 1..k repeat r := shift(r*r,1)-1 + bits p + s * r + + + + cosSeries x == + -- cos(x) = 1 - x**2/2! + x**4/4! - x**6/6! + ... |x| < 1/2 + p := bits() + LENGTH bits() + 1 + y := times(x,x) + s:I := d:I := shift(1,p) + m:I := - shift2(y.mantissa,y.exponent+p) + t:I := m quo 2 + for i in 3.. by 2 while t ^= 0 repeat + s := s + t + t := (m * t) quo (i*(i+1)) + t := t quo d + normalize [s,-p] + + tan x == + s := sign x; x := abs x; p := bits(); inc 6 + if x > [3,0] then (inc p; x := pi()*fractionPart(x/pi()); dec p) + if x > [3,-1] then (inc p; x := pi()-x; s := -s; dec p) + if x > 1 then (c := cos x; t := sqrt(1-c*c)/c) + else (c := sin x; t := c/sqrt(1-c*c)) + bits p + s * t + + P:StoredConstant := [1,[1,2]] + pi() == + -- We use Ramanujan's identity to compute pi. + -- The running time is quadratic in the precision. + -- This is about twice as fast as Machin's identity on Lisp/VM + -- pi = 16 atan(1/5) - 4 atan(1/239) + bits() <= P.precision => normalize P.value + (P := [bits(), piRamanujan()]) value + + piRamanujan() == + -- Ramanujans identity for 1/pi + -- Reference: Shanks and Wrench, Math Comp, 1962 + -- "Calculation of pi to 100,000 Decimals". + n := bits() + LENGTH bits() + 11 + t:I := shift(1,n) quo 882 + d:I := 4*882**2 + s:I := 0 + for i in 2.. by 2 for j in 1123.. by 21460 while t ^= 0 repeat + s := s + j*t + m := -(i-1)*(2*i-1)*(2*i-3) + t := (m*t) quo (d*i**3) + 1 / [s,-n-2] + + sinh x == + zero? x => 0 + lost:I := max(- order x,0) + 2*lost > bits() => x + inc(5+lost); e := exp x; s := (e-1/e)/2; dec(5+lost) + normalize s + + cosh x == + (inc 5; e := exp x; c := (e+1/e)/2; dec 5; normalize c) + + tanh x == + zero? x => 0 + lost:I := max(- order x,0) + 2*lost > bits() => x + inc(6+lost); e := exp x; e := e*e; t := (e-1)/(e+1); dec(6+lost) + normalize t + + asinh x == + p := min(0,order x) + if zero? x or 2*p < -bits() then return x + inc(5-p); r := log(x+sqrt(1+x*x)); dec(5-p) + normalize r + + acosh x == + if x < 1 then error "invalid argument to acosh" + inc 5; r := log(x+sqrt(sub(times(x,x),1))); dec 5 + normalize r + + atanh x == + if x > 1 or x < -1 then error "invalid argument to atanh" + p := min(0,order x) + if zero? x or 2*p < -bits() then return x + inc(5-p); r := log((x+1)/(1-x))/2; dec(5-p) + normalize r + + log x == + negative? x => error "negative log" + zero? x => error "log 0 generated" + p := bits(); inc 5 + -- apply log(x) = n log 2 + log(x/2**n) so that 1/2 < x < 2 + if (n := order x) < 0 then n := n+1 + l := if n = 0 then 0 else (x := shift(x,-n); n * log2) + -- speed the series convergence by finding m and k such that + -- | exp(m/2**k) x - 1 | < 1 / 2 ** O(sqrt p) + -- write log(exp(m/2**k) x) as m/2**k + log x + k := ISQRT (p-100)::I quo 3 + if k > 1 then + k := max(1,k+order(x-1)) + inc k + ek := expInverse (2**k::N) + dec(p quo 2); m := order square(x,k); inc(p quo 2) + m := (6847196937 * m) quo 9878417065 -- m := m log 2 + x := x * ek ** (-m) + l := l + [m,-k] + l := l + logSeries x + bits p + normalize l + + logSeries x == + -- log(x) = 2 y (1 + y**2/3 + y**4/5 ...) for y = (x-1) / (x+1) + -- given 1/2 < x < 2 on input we have -1/3 < y < 1/3 + p := bits() + (g := LENGTH bits() + 3) + inc g; y := (x-1)/(x+1); dec g + s:I := d:I := shift(1,p) + z := times(y,y) + t := m := shift2(z.mantissa,z.exponent+p) + for i in 3.. by 2 while t ^= 0 repeat + s := s + t quo i + t := m * t quo d + y * [s,1-p] + + L2:StoredConstant := [1,1] + log2() == + -- log x = 2 * sum( ((x-1)/(x+1))**(2*k+1)/(2*k+1), k=1.. ) + -- log 2 = 2 * sum( 1/9**k / (2*k+1), k=0..n ) / 3 + n := bits() :: N + n <= L2.precision => normalize L2.value + n := n + LENGTH n + 3 -- guard bits + s:I := shift(1,n+1) quo 3 + t:I := s quo 9 + for k in 3.. by 2 while t ^= 0 repeat + s := s + t quo k + t := t quo 9 + L2 := [bits(),[s,-n]] + normalize L2.value + + L10:StoredConstant := [1,[1,1]] + log10() == + -- log x = 2 * sum( ((x-1)/(x+1))**(2*k+1)/(2*k+1), k=0.. ) + -- log 5/4 = 2 * sum( 1/81**k / (2*k+1), k=0.. ) / 9 + n := bits() :: N + n <= L10.precision => normalize L10.value + n := n + LENGTH n + 5 -- guard bits + s:I := shift(1,n+1) quo 9 + t:I := s quo 81 + for k in 3.. by 2 while t ^= 0 repeat + s := s + t quo k + t := t quo 81 + -- We have log 10 = log 5 + log 2 and log 5/4 = log 5 - 2 log 2 + inc 2; L10 := [bits(),[s,-n] + 3*log2]; dec 2 + normalize L10.value + + log2(x) == (inc 2; r := log(x)/log2; dec 2; normalize r) + log10(x) == (inc 2; r := log(x)/log10; dec 2; normalize r) + + exp(x) == + -- exp(n+x) = exp(1)**n exp(x) for n such that |x| < 1 + p := bits(); inc 5; e1:% := 1 + if (n := wholePart x) ^= 0 then + inc LENGTH n; e1 := exp1 ** n; dec LENGTH n + x := fractionPart x + if zero? x then (bits p; return normalize e1) + -- make |x| < O( 2**(-sqrt p) ) < 1/2 to speed series convergence + -- by repeated use of the formula exp(2*x/2) = exp(x/2)**2 + -- results in an overall running time of O( sqrt p M(p) ) + k := ISQRT (p-100)::I quo 3 + k := max(0,2 + k + order x) + if k > 0 then (inc k; x := shift(x,-k)) + e := expSeries x + if k > 0 then e := square(e,k) + bits p + e * e1 + + expSeries x == + -- exp(x) = 1 + x + x**2/2 + ... + x**i/i! valid for all x + p := bits() + LENGTH bits() + 1 + s:I := d:I := shift(1,p) + t:I := n:I := shift2(x.mantissa,x.exponent+p) + for i in 2.. while t ^= 0 repeat + s := s + t + t := (n * t) quo i + t := t quo d + normalize [s,-p] + + expInverse k == + -- computes exp(1/k) via continued fraction + p0:I := 2*k+1; p1:I := 6*k*p0+1 + q0:I := 2*k-1; q1:I := 6*k*q0+1 + for i in 10*k.. by 4*k while 2 * LENGTH p0 < bits() repeat + (p0,p1) := (p1,i*p1+p0) + (q0,q1) := (q1,i*q1+q0) + dvide([p1,0],[q1,0]) + + E:StoredConstant := [1,[1,1]] + exp1() == + if bits() > E.precision then E := [bits(),expInverse 1] + normalize E.value + + sqrt x == + negative? x => error "negative sqrt" + m := x.mantissa; e := x.exponent + l := LENGTH m + p := 2 * bits() - l + 2 + if odd?(e-l) then p := p - 1 + i := shift2(x.mantissa,p) + -- ISQRT uses a variable precision newton iteration + i := ISQRT i + normalize [i,(e-p) quo 2] + + bits() == BITS() + bits(n) == (t := bits(); BITS() := n; t) + precision() == bits() + precision(n) == bits(n) + increasePrecision n == (b := bits(); bits((b + n)::PI); b) + decreasePrecision n == (b := bits(); bits((b - n)::PI); b) + ceillog10base2 n == ((13301 * n + 4003) quo 4004) :: PI + digits() == max(1,4004 * (bits()-1) quo 13301)::PI + digits(n) == (t := digits(); bits (1 + ceillog10base2 n); t) + + order(a) == LENGTH a.mantissa + a.exponent - 1 + relerror(a,b) == order((a-b)/b) + 0 == [0,0] + 1 == [1,0] + base() == BASE + mantissa x == x.mantissa + exponent x == x.exponent + one? a == a = 1 + zero? a == zero?(a.mantissa) + negative? a == negative?(a.mantissa) + positive? a == positive?(a.mantissa) + + chop(x,p) == + e : I := LENGTH x.mantissa - p + if e > 0 then x := [shift2(x.mantissa,-e),x.exponent+e] + x + float(m,e) == normalize [m,e] + float(m,e,b) == + m = 0 => 0 + inc 4; r := m * [b,0] ** e; dec 4 + normalize r + normalize x == + m := x.mantissa + m = 0 => 0 + e : I := LENGTH m - bits() + if e > 0 then + y := shift2(m,1-e) + if odd? y then + y := (if y>0 then y+1 else y-1) quo 2 + if LENGTH y > bits() then + y := y quo 2 + e := e+1 + else y := y quo 2 + x := [y,x.exponent+e] + x + shift(x:%,n:I) == [x.mantissa,x.exponent+n] + + x = y == + order x = order y and sign x = sign y and zero? (x - y) + x < y == + y.mantissa = 0 => x.mantissa < 0 + x.mantissa = 0 => y.mantissa > 0 + negative? x and positive? y => true + negative? y and positive? x => false + order x < order y => positive? x + order x > order y => negative? x + negative? (x-y) + + abs x == if negative? x then -x else normalize x + ceiling x == + if negative? x then return (-floor(-x)) + if zero? fractionPart x then x else truncate x + 1 + wholePart x == shift2(x.mantissa,x.exponent) + floor x == if negative? x then -ceiling(-x) else truncate x + round x == (half := [sign x,-1]; truncate(x + half)) + sign x == if x.mantissa < 0 then -1 else 1 + truncate x == + if x.exponent >= 0 then return x + normalize [shift2(x.mantissa,x.exponent),0] + recip(x) == if x=0 then "failed" else 1/x + differentiate x == 0 + + - x == normalize negate x + negate x == [-x.mantissa,x.exponent] + x + y == normalize plus(x,y) + x - y == normalize plus(x,negate y) + sub(x,y) == plus(x,negate y) + plus(x,y) == + mx := x.mantissa; my := y.mantissa + mx = 0 => y + my = 0 => x + ex := x.exponent; ey := y.exponent + ex = ey => [mx+my,ex] + de := ex + LENGTH mx - ey - LENGTH my + de > bits()+1 => x + de < -(bits()+1) => y + if ex < ey then (mx,my,ex,ey) := (my,mx,ey,ex) + mw := my + shift2(mx,ex-ey) + [mw,ey] + + x:% * y:% == normalize times (x,y) + x:I * y:% == + if LENGTH x > bits() then normalize [x,0] * y + else normalize [x * y.mantissa,y.exponent] + x:% / y:% == normalize dvide(x,y) + x:% / y:I == + if LENGTH y > bits() then x / normalize [y,0] else x / [y,0] + inv x == 1 / x + + times(x:%,y:%) == [x.mantissa * y.mantissa, x.exponent + y.exponent] + itimes(n:I,y:%) == [n * y.mantissa,y.exponent] + + dvide(x,y) == + ew := LENGTH y.mantissa - LENGTH x.mantissa + bits() + 1 + mw := shift2(x.mantissa,ew) quo y.mantissa + ew := x.exponent - y.exponent - ew + [mw,ew] + + square(x,n) == + ma := x.mantissa; ex := x.exponent + for k in 1..n repeat + ma := ma * ma; ex := ex + ex + l:I := bits()::I - LENGTH ma + ma := shift2(ma,l); ex := ex - l + [ma,ex] + + power(x,n) == + y:% := 1; z:% := x + repeat + if odd? n then y := chop( times(y,z), bits() ) + if (n := n quo 2) = 0 then return y + z := chop( times(z,z), bits() ) + + x:% ** y:% == + x = 0 => + y = 0 => error "0**0 is undefined" + y < 0 => error "division by 0" + y > 0 => 0 + y = 0 => 1 + y = 1 => x + x = 1 => 1 + p := abs order y + 5 + inc p; r := exp(y*log(x)); dec p + normalize r + + x:% ** r:RN == + x = 0 => + r = 0 => error "0**0 is undefined" + r < 0 => error "division by 0" + r > 0 => 0 + r = 0 => 1 + r = 1 => x + x = 1 => 1 + n := numer r + d := denom r + negative? x => + odd? d => + odd? n => return -((-x)**r) + return ((-x)**r) + error "negative root" + if d = 2 then + inc LENGTH n; y := sqrt(x); y := y**n; dec LENGTH n + return normalize y + y := [n,0]/[d,0] + x ** y + + x:% ** n:I == + x = 0 => + n = 0 => error "0**0 is undefined" + n < 0 => error "division by 0" + n > 0 => 0 + n = 0 => 1 + n = 1 => x + x = 1 => 1 + p := bits() + bits(p + LENGTH n + 2) + y := power(x,abs n) + if n < 0 then y := dvide(1,y) + bits p + normalize y + + -- Utility routines for conversion to decimal + ceilLength10: I -> I + chop10: (%,I) -> % + convert10:(%,I) -> % + floorLength10: I -> I + length10: I -> I + normalize10: (%,I) -> % + quotient10: (%,%,I) -> % + power10: (%,I,I) -> % + times10: (%,%,I) -> % + + convert10(x,d) == + m := x.mantissa; e := x.exponent + --!! deal with bits here + b := bits(); (q,r) := divide(abs e, b) + b := 2**b::N; r := 2**r::N + -- compute 2**e = b**q * r + h := power10([b,0],q,d+5) + h := chop10([r*h.mantissa,h.exponent],d+5) + if e < 0 then h := quotient10([m,0],h,d) + else times10([m,0],h,d) + + ceilLength10 n == 146 * LENGTH n quo 485 + 1 + floorLength10 n == 643 * LENGTH n quo 2136 +-- length10 n == DECIMAL_-LENGTH(n)$Lisp + length10 n == + ln := LENGTH(n:=abs n) + upper := 76573 * ln quo 254370 + lower := 21306 * (ln-1) quo 70777 + upper = lower => upper + 1 + n := n quo (10**lower::N) + while n >= 10 repeat + n:= n quo 10 + lower := lower + 1 + lower + 1 + + chop10(x,p) == + e : I := floorLength10 x.mantissa - p + if e > 0 then x := [x.mantissa quo 10**e::N,x.exponent+e] + x + normalize10(x,p) == + ma := x.mantissa + ex := x.exponent + e : I := length10 ma - p + if e > 0 then + ma := ma quo 10**(e-1)::N + ex := ex + e + (ma,r) := divide(ma, 10) + if r > 4 then + ma := ma + 1 + if ma = 10**p::N then (ma := 1; ex := ex + p) + [ma,ex] + times10(x,y,p) == normalize10(times(x,y),p) + quotient10(x,y,p) == + ew := floorLength10 y.mantissa - ceilLength10 x.mantissa + p + 2 + if ew < 0 then ew := 0 + mw := (x.mantissa * 10**ew::N) quo y.mantissa + ew := x.exponent - y.exponent - ew + normalize10([mw,ew],p) + power10(x,n,d) == + x = 0 => 0 + n = 0 => 1 + n = 1 => x + x = 1 => 1 + p:I := d + LENGTH n + 1 + e:I := n + y:% := 1 + z:% := x + repeat + if odd? e then y := chop10(times(y,z),p) + if (e := e quo 2) = 0 then return y + z := chop10(times(z,z),p) + + -------------------------------- + -- Output routines for Floats -- + -------------------------------- + zero ==> char("0") + separator ==> space()$Character + + SPACING : Reference(N) := ref 10 + OUTMODE : Reference(S) := ref "general" + OUTPREC : Reference(I) := ref(-1) + + fixed : % -> S + floating : % -> S + general : % -> S + + padFromLeft(s:S):S == + zero? SPACING() => s + n:I := #s - 1 + t := new( (n + 1 + n quo SPACING()) :: N , separator ) + for i in 0..n for j in minIndex t .. repeat + t.j := s.(i + minIndex s) + if (i+1) rem SPACING() = 0 then j := j+1 + t + padFromRight(s:S):S == + SPACING() = 0 => s + n:I := #s - 1 + t := new( (n + 1 + n quo SPACING()) :: N , separator ) + for i in n..0 by -1 for j in maxIndex t .. by -1 repeat + t.j := s.(i + minIndex s) + if (n-i+1) rem SPACING() = 0 then j := j-1 + t + + fixed f == + zero? f => "0.0" + zero? exponent f => + padFromRight concat(convert(mantissa f)@S, ".0") + negative? f => concat("-", fixed abs f) + d := if OUTPREC() = -1 then digits::I else OUTPREC() +-- g := convert10(abs f,digits); m := g.mantissa; e := g.exponent + g := convert10(abs f,d); m := g.mantissa; e := g.exponent + if OUTPREC() ^= -1 then + -- round g to OUTPREC digits after the decimal point + l := length10 m + if -e > OUTPREC() and -e < 2*digits::I then + g := normalize10(g,l+e+OUTPREC()) + m := g.mantissa; e := g.exponent + s := convert(m)@S; n := #s; o := e+n + p := if OUTPREC() = -1 then n::I else OUTPREC() + t:S + if e >= 0 then + s := concat(s, new(e::N, zero)) + t := "" + else if o <= 0 then + t := concat(new((-o)::N,zero), s) + s := "0" + else + t := s(o + minIndex s .. n + minIndex s - 1) + s := s(minIndex s .. o + minIndex s - 1) + n := #t + if OUTPREC() = -1 then + t := rightTrim(t,zero) + if t = "" then t := "0" + else if n > p then t := t(minIndex t .. p + minIndex t- 1) + else t := concat(t, new((p-n)::N,zero)) + concat(padFromRight s, concat(".", padFromLeft t)) + + floating f == + zero? f => "0.0" + negative? f => concat("-", floating abs f) + t:S := if zero? SPACING() then "E" else " E " + zero? exponent f => + s := convert(mantissa f)@S + concat ["0.", padFromLeft s, t, convert(#s)@S] + -- base conversion to decimal rounded to the requested precision + d := if OUTPREC() = -1 then digits::I else OUTPREC() + g := convert10(f,d); m := g.mantissa; e := g.exponent + -- I'm assuming that length10 m = # s given n > 0 + s := convert(m)@S; n := #s; o := e+n + s := padFromLeft s + concat ["0.", s, t, convert(o)@S] + + general(f) == + zero? f => "0.0" + negative? f => concat("-", general abs f) + d := if OUTPREC() = -1 then digits::I else OUTPREC() + zero? exponent f => + d := d + 1 + s := convert(mantissa f)@S + OUTPREC() ^= -1 and (e := #s) > d => + t:S := if zero? SPACING() then "E" else " E " + concat ["0.", padFromLeft s, t, convert(e)@S] + padFromRight concat(s, ".0") + -- base conversion to decimal rounded to the requested precision + g := convert10(f,d); m := g.mantissa; e := g.exponent + -- I'm assuming that length10 m = # s given n > 0 + s := convert(m)@S; n := #s; o := n + e + -- Note: at least one digit is displayed after the decimal point + -- and trailing zeroes after the decimal point are dropped + if o > 0 and o <= max(n,d) then + -- fixed format: add trailing zeroes before the decimal point + if o > n then s := concat(s, new((o-n)::N,zero)) + t := rightTrim(s(o + minIndex s .. n + minIndex s - 1), zero) + if t = "" then t := "0" else t := padFromLeft t + s := padFromRight s(minIndex s .. o + minIndex s - 1) + concat(s, concat(".", t)) + else if o <= 0 and o >= -5 then + -- fixed format: up to 5 leading zeroes after the decimal point + concat("0.",padFromLeft concat(new((-o)::N,zero),rightTrim(s,zero))) + else + -- print using E format written 0.mantissa E exponent + t := padFromLeft rightTrim(s,zero) + s := if zero? SPACING() then "E" else " E " + concat ["0.", t, s, convert(e+n)@S] + + outputSpacing n == SPACING() := n + outputFixed() == (OUTMODE() := "fixed"; OUTPREC() := -1) + outputFixed n == (OUTMODE() := "fixed"; OUTPREC() := n::I) + outputGeneral() == (OUTMODE() := "general"; OUTPREC() := -1) + outputGeneral n == (OUTMODE() := "general"; OUTPREC() := n::I) + outputFloating() == (OUTMODE() := "floating"; OUTPREC() := -1) + outputFloating n == (OUTMODE() := "floating"; OUTPREC() := n::I) + + convert(f):S == + b:Integer := + OUTPREC() = -1 and not zero? f => + bits(length(abs mantissa f)::PositiveInteger) + 0 + s := + OUTMODE() = "fixed" => fixed f + OUTMODE() = "floating" => floating f + OUTMODE() = "general" => general f + empty()$String + if b > 0 then bits(b::PositiveInteger) + s = empty()$String => error "bad output mode" + s + + coerce(f):OutputForm == + f >= 0 => message(convert(f)@S) + - (coerce(-f)@OutputForm) + + convert(f):InputForm == + convert [convert("float"::Symbol), convert mantissa f, + convert exponent f, convert base()]$List(InputForm) + + -- Conversion routines + convert(x:%):Float == x pretend Float + convert(x:%):SF == makeSF(x.mantissa,x.exponent)$Lisp + coerce(x:%):SF == convert(x)@SF + convert(sf:SF):% == float(mantissa sf,exponent sf,base()$SF) + + retract(f:%):RN == rationalApproximation(f,(bits()-1)::N,BASE) + + retractIfCan(f:%):Union(RN, "failed") == + rationalApproximation(f,(bits()-1)::N,BASE) + + retract(f:%):I == + (f = (n := wholePart f)::%) => n + error "Not an integer" + + retractIfCan(f:%):Union(I, "failed") == + (f = (n := wholePart f)::%) => n + "failed" + + rationalApproximation(f,d) == rationalApproximation(f,d,10) + + rationalApproximation(f,d,b) == + t: Integer + nu := f.mantissa; ex := f.exponent + if ex >= 0 then return ((nu*BASE**(ex::N))/1) + de := BASE**((-ex)::N) + if b < 2 then error "base must be > 1" + tol := b**d + s := nu; t := de + p0,p1,q0,q1 : Integer + p0 := 0; p1 := 1; q0 := 1; q1 := 0 + repeat + (q,r) := divide(s, t) + p2 := q*p1+p0 + q2 := q*q1+q0 + if r = 0 or tol*abs(nu*q2-de*p2) < de*abs(p2) then return (p2/q2) + (p0,p1) := (p1,p2) + (q0,q1) := (q1,q2) + (s,t) := (t,r) + +@ + +--% Float: arbitrary precision floating point arithmetic domain + +\section{License} +<<license>>= +--Copyright (c) 1991-2002, The Numerical ALgorithms Group Ltd. +--All rights reserved. +-- +--Redistribution and use in source and binary forms, with or without +--modification, are permitted provided that the following conditions are +--met: +-- +-- - Redistributions of source code must retain the above copyright +-- notice, this list of conditions and the following disclaimer. +-- +-- - Redistributions in binary form must reproduce the above copyright +-- notice, this list of conditions and the following disclaimer in +-- the documentation and/or other materials provided with the +-- distribution. +-- +-- - Neither the name of The Numerical ALgorithms Group Ltd. nor the +-- names of its contributors may be used to endorse or promote products +-- derived from this software without specific prior written permission. +-- +--THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS +--IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED +--TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +--PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER +--OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, +--EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, +--PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR +--PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF +--LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING +--NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +--SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +@ +<<*>>= +<<license>> + +<<domain FLOAT Float>> +@ +\eject +\begin{thebibliography}{99} +\bibitem{1} nothing +\end{thebibliography} +\end{document} |