aboutsummaryrefslogtreecommitdiff
path: root/src/algebra/float.spad.pamphlet
diff options
context:
space:
mode:
Diffstat (limited to 'src/algebra/float.spad.pamphlet')
-rw-r--r--src/algebra/float.spad.pamphlet1064
1 files changed, 1064 insertions, 0 deletions
diff --git a/src/algebra/float.spad.pamphlet b/src/algebra/float.spad.pamphlet
new file mode 100644
index 00000000..3f5753f7
--- /dev/null
+++ b/src/algebra/float.spad.pamphlet
@@ -0,0 +1,1064 @@
+\documentclass{article}
+\usepackage{axiom}
+\begin{document}
+\title{\$SPAD/src/algebra float.spad}
+\author{Michael Monagan}
+\maketitle
+\begin{abstract}
+\end{abstract}
+\eject
+\tableofcontents
+\eject
+\section{domain FLOAT Float}
+As reported in bug number 4733 (rounding of negative numbers)
+errors were observed in operations such as
+\begin{verbatim}
+ -> round(-3.9)
+ -> truncate(-3.9)
+\end{verbatim}
+The problem is the unexpected behaviour of the shift
+with negative integer arguments.
+\begin{verbatim}
+ -> shift(-7,-1)
+\end{verbatim}
+returns -4 while the code here in float expects the
+value to be -3. shift uses the lisp function ASH
+'arithmetic shift left' but the spad code expects
+an unsigned 'logical' shift. See
+\begin{verbatim}
+ http://www.lispworks.com/reference/HyperSpec/Body/f_ash.htm#ash
+\end{verbatim}
+A new internal function shift2 is defined in terms of
+shift to compensate for the use of ASH and provide the
+required function.
+
+It is currently unknown whether the unexpected behaviour
+of shift for negative arguments will cause bugs in other
+parts of Axiom.
+<<domain FLOAT Float>>=
+)abbrev domain FLOAT Float
+
+B ==> Boolean
+I ==> Integer
+S ==> String
+PI ==> PositiveInteger
+RN ==> Fraction Integer
+SF ==> DoubleFloat
+N ==> NonNegativeInteger
+
+++ Author: Michael Monagan
+++ Date Created:
+++ December 1987
+++ Change History:
+++ 19 Jun 1990
+++ Basic Operations: outputFloating, outputFixed, outputGeneral, outputSpacing,
+++ atan, convert, exp1, log2, log10, normalize, rationalApproximation,
+++ relerror, shift, / , **
+++ Keywords: float, floating point, number
+++ Description: \spadtype{Float} implements arbitrary precision floating
+++ point arithmetic.
+++ The number of significant digits of each operation can be set
+++ to an arbitrary value (the default is 20 decimal digits).
+++ The operation \spad{float(mantissa,exponent,\spadfunFrom{base}{FloatingPointSystem})} for integer
+++ \spad{mantissa}, \spad{exponent} specifies the number
+++ \spad{mantissa * \spadfunFrom{base}{FloatingPointSystem} ** exponent}
+++ The underlying representation for floats is binary
+++ not decimal. The implications of this are described below.
+++
+++ The model adopted is that arithmetic operations are rounded to
+++ to nearest unit in the last place, that is, accurate to within
+++ \spad{2**(-\spadfunFrom{bits}{FloatingPointSystem})}.
+++ Also, the elementary functions and constants are
+++ accurate to one unit in the last place.
+++ A float is represented as a record of two integers, the mantissa
+++ and the exponent. The \spadfunFrom{base}{FloatingPointSystem}
+++ of the representation is binary, hence
+++ a \spad{Record(m:mantissa,e:exponent)} represents the number \spad{m * 2 ** e}.
+++ Though it is not assumed that the underlying integers are represented
+++ with a binary \spadfunFrom{base}{FloatingPointSystem},
+++ the code will be most efficient when this is the
+++ the case (this is true in most implementations of Lisp).
+++ The decision to choose the \spadfunFrom{base}{FloatingPointSystem} to be
+++ binary has some unfortunate
+++ consequences. First, decimal numbers like 0.3 cannot be represented
+++ exactly. Second, there is a further loss of accuracy during
+++ conversion to decimal for output. To compensate for this, if d
+++ digits of precision are specified, \spad{1 + ceiling(log2 d)} bits are used.
+++ Two numbers that are displayed identically may therefore be
+++ not equal. On the other hand, a significant efficiency loss would
+++ be incurred if we chose to use a decimal \spadfunFrom{base}{FloatingPointSystem} when the underlying
+++ integer base is binary.
+++
+++ Algorithms used:
+++ For the elementary functions, the general approach is to apply
+++ identities so that the taylor series can be used, and, so
+++ that it will converge within \spad{O( sqrt n )} steps. For example,
+++ using the identity \spad{exp(x) = exp(x/2)**2}, we can compute
+++ \spad{exp(1/3)} to n digits of precision as follows. We have
+++ \spad{exp(1/3) = exp(2 ** (-sqrt s) / 3) ** (2 ** sqrt s)}.
+++ The taylor series will converge in less than sqrt n steps and the
+++ exponentiation requires sqrt n multiplications for a total of
+++ \spad{2 sqrt n} multiplications. Assuming integer multiplication costs
+++ \spad{O( n**2 )} the overall running time is \spad{O( sqrt(n) n**2 )}.
+++ This approach is the best known approach for precisions up to
+++ about 10,000 digits at which point the methods of Brent
+++ which are \spad{O( log(n) n**2 )} become competitive. Note also that
+++ summing the terms of the taylor series for the elementary
+++ functions is done using integer operations. This avoids the
+++ overhead of floating point operations and results in efficient
+++ code at low precisions. This implementation makes no attempt
+++ to reuse storage, relying on the underlying system to do
+++ \spadgloss{garbage collection}. I estimate that the efficiency of this
+++ package at low precisions could be improved by a factor of 2
+++ if in-place operations were available.
+++
+++ Running times: in the following, n is the number of bits of precision
+++ \spad{*}, \spad{/}, \spad{sqrt}, \spad{pi}, \spad{exp1}, \spad{log2}, \spad{log10}: \spad{ O( n**2 )}
+++ \spad{exp}, \spad{log}, \spad{sin}, \spad{atan}: \spad{ O( sqrt(n) n**2 )}
+++ The other elementary functions are coded in terms of the ones above.
+
+
+Float():
+ Join(FloatingPointSystem, DifferentialRing, ConvertibleTo String, OpenMath,_
+ CoercibleTo DoubleFloat, TranscendentalFunctionCategory, ConvertibleTo InputForm) with
+ _/ : (%, I) -> %
+ ++ x / i computes the division from x by an integer i.
+ _*_*: (%, %) -> %
+ ++ x ** y computes \spad{exp(y log x)} where \spad{x >= 0}.
+ normalize: % -> %
+ ++ normalize(x) normalizes x at current precision.
+ relerror : (%, %) -> I
+ ++ relerror(x,y) computes the absolute value of \spad{x - y} divided by
+ ++ y, when \spad{y \^= 0}.
+ shift: (%, I) -> %
+ ++ shift(x,n) adds n to the exponent of float x.
+ rationalApproximation: (%, N) -> RN
+ ++ rationalApproximation(f, n) computes a rational approximation
+ ++ r to f with relative error \spad{< 10**(-n)}.
+ rationalApproximation: (%, N, N) -> RN
+ ++ rationalApproximation(f, n, b) computes a rational
+ ++ approximation r to f with relative error \spad{< b**(-n)}, that is
+ ++ \spad{|(r-f)/f| < b**(-n)}.
+ log2 : () -> %
+ ++ log2() returns \spad{ln 2}, i.e. \spad{0.6931471805...}.
+ log10: () -> %
+ ++ log10() returns \spad{ln 10}: \spad{2.3025809299...}.
+ exp1 : () -> %
+ ++ exp1() returns exp 1: \spad{2.7182818284...}.
+ atan : (%,%) -> %
+ ++ atan(x,y) computes the arc tangent from x with phase y.
+ log2 : % -> %
+ ++ log2(x) computes the logarithm for x to base 2.
+ log10: % -> %
+ ++ log10(x) computes the logarithm for x to base 10.
+ convert: SF -> %
+ ++ convert(x) converts a \spadtype{DoubleFloat} x to a \spadtype{Float}.
+ outputFloating: () -> Void
+ ++ outputFloating() sets the output mode to floating (scientific) notation, i.e.
+ ++ \spad{mantissa * 10 exponent} is displayed as \spad{0.mantissa E exponent}.
+ outputFloating: N -> Void
+ ++ outputFloating(n) sets the output mode to floating (scientific) notation
+ ++ with n significant digits displayed after the decimal point.
+ outputFixed: () -> Void
+ ++ outputFixed() sets the output mode to fixed point notation;
+ ++ the output will contain a decimal point.
+ outputFixed: N -> Void
+ ++ outputFixed(n) sets the output mode to fixed point notation,
+ ++ with n digits displayed after the decimal point.
+ outputGeneral: () -> Void
+ ++ outputGeneral() sets the output mode (default mode) to general
+ ++ notation; numbers will be displayed in either fixed or floating
+ ++ (scientific) notation depending on the magnitude.
+ outputGeneral: N -> Void
+ ++ outputGeneral(n) sets the output mode to general notation
+ ++ with n significant digits displayed.
+ outputSpacing: N -> Void
+ ++ outputSpacing(n) inserts a space after n (default 10) digits on output;
+ ++ outputSpacing(0) means no spaces are inserted.
+ arbitraryPrecision
+ arbitraryExponent
+ == add
+ BASE ==> 2
+ BITS:Reference(PI) := ref 68 -- 20 digits
+ LENGTH ==> INTEGER_-LENGTH$Lisp
+ ISQRT ==> approxSqrt$IntegerRoots(I)
+ Rep := Record( mantissa:I, exponent:I )
+ StoredConstant ==> Record( precision:PI, value:% )
+ UCA ==> Record( unit:%, coef:%, associate:% )
+ inc ==> increasePrecision
+ dec ==> decreasePrecision
+
+ -- local utility operations
+ shift2 : (I,I) -> I -- WSP: fix bug in shift
+ times : (%,%) -> % -- multiply x and y with no rounding
+ itimes: (I,%) -> % -- multiply by a small integer
+ chop: (%,PI) -> % -- chop x at p bits of precision
+ dvide: (%,%) -> % -- divide x by y with no rounding
+ square: (%,I) -> % -- repeated squaring with chopping
+ power: (%,I) -> % -- x ** n with chopping
+ plus: (%,%) -> % -- addition with no rounding
+ sub: (%,%) -> % -- subtraction with no rounding
+ negate: % -> % -- negation with no rounding
+ ceillog10base2: PI -> PI -- rational approximation
+ floorln2: PI -> PI -- rational approximation
+
+ atanSeries: % -> % -- atan(x) by taylor series |x| < 1/2
+ atanInverse: I -> % -- atan(1/n) for n an integer > 1
+ expInverse: I -> % -- exp(1/n) for n an integer
+ expSeries: % -> % -- exp(x) by taylor series |x| < 1/2
+ logSeries: % -> % -- log(x) by taylor series 1/2 < x < 2
+ sinSeries: % -> % -- sin(x) by taylor series |x| < 1/2
+ cosSeries: % -> % -- cos(x) by taylor series |x| < 1/2
+ piRamanujan: () -> % -- pi using Ramanujans series
+
+ writeOMFloat(dev: OpenMathDevice, x: %): Void ==
+ OMputApp(dev)
+ OMputSymbol(dev, "bigfloat1", "bigfloat")
+ OMputInteger(dev, mantissa x)
+ OMputInteger(dev, 2)
+ OMputInteger(dev, exponent x)
+ OMputEndApp(dev)
+
+ OMwrite(x: %): String ==
+ s: String := ""
+ sp := OM_-STRINGTOSTRINGPTR(s)$Lisp
+ dev: OpenMathDevice := OMopenString(sp pretend String, OMencodingXML)
+ OMputObject(dev)
+ writeOMFloat(dev, x)
+ OMputEndObject(dev)
+ OMclose(dev)
+ s := OM_-STRINGPTRTOSTRING(sp)$Lisp pretend String
+ s
+
+ OMwrite(x: %, wholeObj: Boolean): String ==
+ s: String := ""
+ sp := OM_-STRINGTOSTRINGPTR(s)$Lisp
+ dev: OpenMathDevice := OMopenString(sp pretend String, OMencodingXML)
+ if wholeObj then
+ OMputObject(dev)
+ writeOMFloat(dev, x)
+ if wholeObj then
+ OMputEndObject(dev)
+ OMclose(dev)
+ s := OM_-STRINGPTRTOSTRING(sp)$Lisp pretend String
+ s
+
+ OMwrite(dev: OpenMathDevice, x: %): Void ==
+ OMputObject(dev)
+ writeOMFloat(dev, x)
+ OMputEndObject(dev)
+
+ OMwrite(dev: OpenMathDevice, x: %, wholeObj: Boolean): Void ==
+ if wholeObj then
+ OMputObject(dev)
+ writeOMFloat(dev, x)
+ if wholeObj then
+ OMputEndObject(dev)
+
+ shift2(x,y) == sign(x)*shift(sign(x)*x,y)
+
+ asin x ==
+ zero? x => 0
+ negative? x => -asin(-x)
+-- one? x => pi()/2
+ (x = 1) => pi()/2
+ x > 1 => error "asin: argument > 1 in magnitude"
+ inc 5; r := atan(x/sqrt(sub(1,times(x,x)))); dec 5
+ normalize r
+
+ acos x ==
+ zero? x => pi()/2
+ negative? x => (inc 3; r := pi()-acos(-x); dec 3; normalize r)
+-- one? x => 0
+ (x = 1) => 0
+ x > 1 => error "acos: argument > 1 in magnitude"
+ inc 5; r := atan(sqrt(sub(1,times(x,x)))/x); dec 5
+ normalize r
+
+ atan(x,y) ==
+ x = 0 =>
+ y > 0 => pi()/2
+ y < 0 => -pi()/2
+ 0
+ -- Only count on first quadrant being on principal branch.
+ theta := atan abs(y/x)
+ if x < 0 then theta := pi() - theta
+ if y < 0 then theta := - theta
+ theta
+
+ atan x ==
+ zero? x => 0
+ negative? x => -atan(-x)
+ if x > 1 then
+ inc 4
+ r := if zero? fractionPart x and x < [bits(),0] then atanInverse wholePart x
+ else atan(1/x)
+ r := pi/2 - r
+ dec 4
+ return normalize r
+ -- make |x| < O( 2**(-sqrt p) ) < 1/2 to speed series convergence
+ -- by using the formula atan(x) = 2*atan(x/(1+sqrt(1+x**2)))
+ k := ISQRT (bits()-100)::I quo 5
+ k := max(0,2 + k + order x)
+ inc(2*k)
+ for i in 1..k repeat x := x/(1+sqrt(1+x*x))
+ t := atanSeries x
+ dec(2*k)
+ t := shift(t,k)
+ normalize t
+
+ atanSeries x ==
+ -- atan(x) = x (1 - x**2/3 + x**4/5 - x**6/7 + ...) |x| < 1
+ p := bits() + LENGTH bits() + 2
+ s:I := d:I := shift(1,p)
+ y := times(x,x)
+ t := m := - shift2(y.mantissa,y.exponent+p)
+ for i in 3.. by 2 while t ^= 0 repeat
+ s := s + t quo i
+ t := (m * t) quo d
+ x * [s,-p]
+
+ atanInverse n ==
+ -- compute atan(1/n) for an integer n > 1
+ -- atan n = 1/n - 1/n**3/3 + 1/n**5/4 - ...
+ -- pi = 16 atan(1/5) - 4 atan(1/239)
+ n2 := -n*n
+ e:I := bits() + LENGTH bits() + LENGTH n + 1
+ s:I := shift(1,e) quo n
+ t:I := s quo n2
+ for k in 3.. by 2 while t ^= 0 repeat
+ s := s + t quo k
+ t := t quo n2
+ normalize [s,-e]
+
+ sin x ==
+ s := sign x; x := abs x; p := bits(); inc 4
+ if x > [6,0] then (inc p; x := 2*pi*fractionPart(x/pi/2); bits p)
+ if x > [3,0] then (inc p; s := -s; x := x - pi; bits p)
+ if x > [3,-1] then (inc p; x := pi - x; dec p)
+ -- make |x| < O( 2**(-sqrt p) ) < 1/2 to speed series convergence
+ -- by using the formula sin(3*x/3) = 3 sin(x/3) - 4 sin(x/3)**3
+ -- the running time is O( sqrt p M(p) ) assuming |x| < 1
+ k := ISQRT (bits()-100)::I quo 4
+ k := max(0,2 + k + order x)
+ if k > 0 then (inc k; x := x / 3**k::N)
+ r := sinSeries x
+ for i in 1..k repeat r := itimes(3,r)-shift(r**3,2)
+ bits p
+ s * r
+
+ sinSeries x ==
+ -- sin(x) = x (1 - x**2/3! + x**4/5! - x**6/7! + ... |x| < 1/2
+ p := bits() + LENGTH bits() + 2
+ y := times(x,x)
+ s:I := d:I := shift(1,p)
+ m:I := - shift2(y.mantissa,y.exponent+p)
+ t:I := m quo 6
+ for i in 4.. by 2 while t ^= 0 repeat
+ s := s + t
+ t := (m * t) quo (i*(i+1))
+ t := t quo d
+ x * [s,-p]
+
+ cos x ==
+ s:I := 1; x := abs x; p := bits(); inc 4
+ if x > [6,0] then (inc p; x := 2*pi*fractionPart(x/pi/2); dec p)
+ if x > [3,0] then (inc p; s := -s; x := x-pi; dec p)
+ if x > [1,0] then
+ -- take care of the accuracy problem near pi/2
+ inc p; x := pi/2-x; bits p; x := normalize x
+ return (s * sin x)
+ -- make |x| < O( 2**(-sqrt p) ) < 1/2 to speed series convergence
+ -- by using the formula cos(2*x/2) = 2 cos(x/2)**2 - 1
+ -- the running time is O( sqrt p M(p) ) assuming |x| < 1
+ k := ISQRT (bits()-100)::I quo 3
+ k := max(0,2 + k + order x)
+ -- need to increase precision by more than k, otherwise recursion
+ -- causes loss of accuracy.
+ -- Michael Monagan suggests adding a factor of log(k)
+ if k > 0 then (inc(k+length(k)**2); x := shift(x,-k))
+ r := cosSeries x
+ for i in 1..k repeat r := shift(r*r,1)-1
+ bits p
+ s * r
+
+
+
+ cosSeries x ==
+ -- cos(x) = 1 - x**2/2! + x**4/4! - x**6/6! + ... |x| < 1/2
+ p := bits() + LENGTH bits() + 1
+ y := times(x,x)
+ s:I := d:I := shift(1,p)
+ m:I := - shift2(y.mantissa,y.exponent+p)
+ t:I := m quo 2
+ for i in 3.. by 2 while t ^= 0 repeat
+ s := s + t
+ t := (m * t) quo (i*(i+1))
+ t := t quo d
+ normalize [s,-p]
+
+ tan x ==
+ s := sign x; x := abs x; p := bits(); inc 6
+ if x > [3,0] then (inc p; x := pi()*fractionPart(x/pi()); dec p)
+ if x > [3,-1] then (inc p; x := pi()-x; s := -s; dec p)
+ if x > 1 then (c := cos x; t := sqrt(1-c*c)/c)
+ else (c := sin x; t := c/sqrt(1-c*c))
+ bits p
+ s * t
+
+ P:StoredConstant := [1,[1,2]]
+ pi() ==
+ -- We use Ramanujan's identity to compute pi.
+ -- The running time is quadratic in the precision.
+ -- This is about twice as fast as Machin's identity on Lisp/VM
+ -- pi = 16 atan(1/5) - 4 atan(1/239)
+ bits() <= P.precision => normalize P.value
+ (P := [bits(), piRamanujan()]) value
+
+ piRamanujan() ==
+ -- Ramanujans identity for 1/pi
+ -- Reference: Shanks and Wrench, Math Comp, 1962
+ -- "Calculation of pi to 100,000 Decimals".
+ n := bits() + LENGTH bits() + 11
+ t:I := shift(1,n) quo 882
+ d:I := 4*882**2
+ s:I := 0
+ for i in 2.. by 2 for j in 1123.. by 21460 while t ^= 0 repeat
+ s := s + j*t
+ m := -(i-1)*(2*i-1)*(2*i-3)
+ t := (m*t) quo (d*i**3)
+ 1 / [s,-n-2]
+
+ sinh x ==
+ zero? x => 0
+ lost:I := max(- order x,0)
+ 2*lost > bits() => x
+ inc(5+lost); e := exp x; s := (e-1/e)/2; dec(5+lost)
+ normalize s
+
+ cosh x ==
+ (inc 5; e := exp x; c := (e+1/e)/2; dec 5; normalize c)
+
+ tanh x ==
+ zero? x => 0
+ lost:I := max(- order x,0)
+ 2*lost > bits() => x
+ inc(6+lost); e := exp x; e := e*e; t := (e-1)/(e+1); dec(6+lost)
+ normalize t
+
+ asinh x ==
+ p := min(0,order x)
+ if zero? x or 2*p < -bits() then return x
+ inc(5-p); r := log(x+sqrt(1+x*x)); dec(5-p)
+ normalize r
+
+ acosh x ==
+ if x < 1 then error "invalid argument to acosh"
+ inc 5; r := log(x+sqrt(sub(times(x,x),1))); dec 5
+ normalize r
+
+ atanh x ==
+ if x > 1 or x < -1 then error "invalid argument to atanh"
+ p := min(0,order x)
+ if zero? x or 2*p < -bits() then return x
+ inc(5-p); r := log((x+1)/(1-x))/2; dec(5-p)
+ normalize r
+
+ log x ==
+ negative? x => error "negative log"
+ zero? x => error "log 0 generated"
+ p := bits(); inc 5
+ -- apply log(x) = n log 2 + log(x/2**n) so that 1/2 < x < 2
+ if (n := order x) < 0 then n := n+1
+ l := if n = 0 then 0 else (x := shift(x,-n); n * log2)
+ -- speed the series convergence by finding m and k such that
+ -- | exp(m/2**k) x - 1 | < 1 / 2 ** O(sqrt p)
+ -- write log(exp(m/2**k) x) as m/2**k + log x
+ k := ISQRT (p-100)::I quo 3
+ if k > 1 then
+ k := max(1,k+order(x-1))
+ inc k
+ ek := expInverse (2**k::N)
+ dec(p quo 2); m := order square(x,k); inc(p quo 2)
+ m := (6847196937 * m) quo 9878417065 -- m := m log 2
+ x := x * ek ** (-m)
+ l := l + [m,-k]
+ l := l + logSeries x
+ bits p
+ normalize l
+
+ logSeries x ==
+ -- log(x) = 2 y (1 + y**2/3 + y**4/5 ...) for y = (x-1) / (x+1)
+ -- given 1/2 < x < 2 on input we have -1/3 < y < 1/3
+ p := bits() + (g := LENGTH bits() + 3)
+ inc g; y := (x-1)/(x+1); dec g
+ s:I := d:I := shift(1,p)
+ z := times(y,y)
+ t := m := shift2(z.mantissa,z.exponent+p)
+ for i in 3.. by 2 while t ^= 0 repeat
+ s := s + t quo i
+ t := m * t quo d
+ y * [s,1-p]
+
+ L2:StoredConstant := [1,1]
+ log2() ==
+ -- log x = 2 * sum( ((x-1)/(x+1))**(2*k+1)/(2*k+1), k=1.. )
+ -- log 2 = 2 * sum( 1/9**k / (2*k+1), k=0..n ) / 3
+ n := bits() :: N
+ n <= L2.precision => normalize L2.value
+ n := n + LENGTH n + 3 -- guard bits
+ s:I := shift(1,n+1) quo 3
+ t:I := s quo 9
+ for k in 3.. by 2 while t ^= 0 repeat
+ s := s + t quo k
+ t := t quo 9
+ L2 := [bits(),[s,-n]]
+ normalize L2.value
+
+ L10:StoredConstant := [1,[1,1]]
+ log10() ==
+ -- log x = 2 * sum( ((x-1)/(x+1))**(2*k+1)/(2*k+1), k=0.. )
+ -- log 5/4 = 2 * sum( 1/81**k / (2*k+1), k=0.. ) / 9
+ n := bits() :: N
+ n <= L10.precision => normalize L10.value
+ n := n + LENGTH n + 5 -- guard bits
+ s:I := shift(1,n+1) quo 9
+ t:I := s quo 81
+ for k in 3.. by 2 while t ^= 0 repeat
+ s := s + t quo k
+ t := t quo 81
+ -- We have log 10 = log 5 + log 2 and log 5/4 = log 5 - 2 log 2
+ inc 2; L10 := [bits(),[s,-n] + 3*log2]; dec 2
+ normalize L10.value
+
+ log2(x) == (inc 2; r := log(x)/log2; dec 2; normalize r)
+ log10(x) == (inc 2; r := log(x)/log10; dec 2; normalize r)
+
+ exp(x) ==
+ -- exp(n+x) = exp(1)**n exp(x) for n such that |x| < 1
+ p := bits(); inc 5; e1:% := 1
+ if (n := wholePart x) ^= 0 then
+ inc LENGTH n; e1 := exp1 ** n; dec LENGTH n
+ x := fractionPart x
+ if zero? x then (bits p; return normalize e1)
+ -- make |x| < O( 2**(-sqrt p) ) < 1/2 to speed series convergence
+ -- by repeated use of the formula exp(2*x/2) = exp(x/2)**2
+ -- results in an overall running time of O( sqrt p M(p) )
+ k := ISQRT (p-100)::I quo 3
+ k := max(0,2 + k + order x)
+ if k > 0 then (inc k; x := shift(x,-k))
+ e := expSeries x
+ if k > 0 then e := square(e,k)
+ bits p
+ e * e1
+
+ expSeries x ==
+ -- exp(x) = 1 + x + x**2/2 + ... + x**i/i! valid for all x
+ p := bits() + LENGTH bits() + 1
+ s:I := d:I := shift(1,p)
+ t:I := n:I := shift2(x.mantissa,x.exponent+p)
+ for i in 2.. while t ^= 0 repeat
+ s := s + t
+ t := (n * t) quo i
+ t := t quo d
+ normalize [s,-p]
+
+ expInverse k ==
+ -- computes exp(1/k) via continued fraction
+ p0:I := 2*k+1; p1:I := 6*k*p0+1
+ q0:I := 2*k-1; q1:I := 6*k*q0+1
+ for i in 10*k.. by 4*k while 2 * LENGTH p0 < bits() repeat
+ (p0,p1) := (p1,i*p1+p0)
+ (q0,q1) := (q1,i*q1+q0)
+ dvide([p1,0],[q1,0])
+
+ E:StoredConstant := [1,[1,1]]
+ exp1() ==
+ if bits() > E.precision then E := [bits(),expInverse 1]
+ normalize E.value
+
+ sqrt x ==
+ negative? x => error "negative sqrt"
+ m := x.mantissa; e := x.exponent
+ l := LENGTH m
+ p := 2 * bits() - l + 2
+ if odd?(e-l) then p := p - 1
+ i := shift2(x.mantissa,p)
+ -- ISQRT uses a variable precision newton iteration
+ i := ISQRT i
+ normalize [i,(e-p) quo 2]
+
+ bits() == BITS()
+ bits(n) == (t := bits(); BITS() := n; t)
+ precision() == bits()
+ precision(n) == bits(n)
+ increasePrecision n == (b := bits(); bits((b + n)::PI); b)
+ decreasePrecision n == (b := bits(); bits((b - n)::PI); b)
+ ceillog10base2 n == ((13301 * n + 4003) quo 4004) :: PI
+ digits() == max(1,4004 * (bits()-1) quo 13301)::PI
+ digits(n) == (t := digits(); bits (1 + ceillog10base2 n); t)
+
+ order(a) == LENGTH a.mantissa + a.exponent - 1
+ relerror(a,b) == order((a-b)/b)
+ 0 == [0,0]
+ 1 == [1,0]
+ base() == BASE
+ mantissa x == x.mantissa
+ exponent x == x.exponent
+ one? a == a = 1
+ zero? a == zero?(a.mantissa)
+ negative? a == negative?(a.mantissa)
+ positive? a == positive?(a.mantissa)
+
+ chop(x,p) ==
+ e : I := LENGTH x.mantissa - p
+ if e > 0 then x := [shift2(x.mantissa,-e),x.exponent+e]
+ x
+ float(m,e) == normalize [m,e]
+ float(m,e,b) ==
+ m = 0 => 0
+ inc 4; r := m * [b,0] ** e; dec 4
+ normalize r
+ normalize x ==
+ m := x.mantissa
+ m = 0 => 0
+ e : I := LENGTH m - bits()
+ if e > 0 then
+ y := shift2(m,1-e)
+ if odd? y then
+ y := (if y>0 then y+1 else y-1) quo 2
+ if LENGTH y > bits() then
+ y := y quo 2
+ e := e+1
+ else y := y quo 2
+ x := [y,x.exponent+e]
+ x
+ shift(x:%,n:I) == [x.mantissa,x.exponent+n]
+
+ x = y ==
+ order x = order y and sign x = sign y and zero? (x - y)
+ x < y ==
+ y.mantissa = 0 => x.mantissa < 0
+ x.mantissa = 0 => y.mantissa > 0
+ negative? x and positive? y => true
+ negative? y and positive? x => false
+ order x < order y => positive? x
+ order x > order y => negative? x
+ negative? (x-y)
+
+ abs x == if negative? x then -x else normalize x
+ ceiling x ==
+ if negative? x then return (-floor(-x))
+ if zero? fractionPart x then x else truncate x + 1
+ wholePart x == shift2(x.mantissa,x.exponent)
+ floor x == if negative? x then -ceiling(-x) else truncate x
+ round x == (half := [sign x,-1]; truncate(x + half))
+ sign x == if x.mantissa < 0 then -1 else 1
+ truncate x ==
+ if x.exponent >= 0 then return x
+ normalize [shift2(x.mantissa,x.exponent),0]
+ recip(x) == if x=0 then "failed" else 1/x
+ differentiate x == 0
+
+ - x == normalize negate x
+ negate x == [-x.mantissa,x.exponent]
+ x + y == normalize plus(x,y)
+ x - y == normalize plus(x,negate y)
+ sub(x,y) == plus(x,negate y)
+ plus(x,y) ==
+ mx := x.mantissa; my := y.mantissa
+ mx = 0 => y
+ my = 0 => x
+ ex := x.exponent; ey := y.exponent
+ ex = ey => [mx+my,ex]
+ de := ex + LENGTH mx - ey - LENGTH my
+ de > bits()+1 => x
+ de < -(bits()+1) => y
+ if ex < ey then (mx,my,ex,ey) := (my,mx,ey,ex)
+ mw := my + shift2(mx,ex-ey)
+ [mw,ey]
+
+ x:% * y:% == normalize times (x,y)
+ x:I * y:% ==
+ if LENGTH x > bits() then normalize [x,0] * y
+ else normalize [x * y.mantissa,y.exponent]
+ x:% / y:% == normalize dvide(x,y)
+ x:% / y:I ==
+ if LENGTH y > bits() then x / normalize [y,0] else x / [y,0]
+ inv x == 1 / x
+
+ times(x:%,y:%) == [x.mantissa * y.mantissa, x.exponent + y.exponent]
+ itimes(n:I,y:%) == [n * y.mantissa,y.exponent]
+
+ dvide(x,y) ==
+ ew := LENGTH y.mantissa - LENGTH x.mantissa + bits() + 1
+ mw := shift2(x.mantissa,ew) quo y.mantissa
+ ew := x.exponent - y.exponent - ew
+ [mw,ew]
+
+ square(x,n) ==
+ ma := x.mantissa; ex := x.exponent
+ for k in 1..n repeat
+ ma := ma * ma; ex := ex + ex
+ l:I := bits()::I - LENGTH ma
+ ma := shift2(ma,l); ex := ex - l
+ [ma,ex]
+
+ power(x,n) ==
+ y:% := 1; z:% := x
+ repeat
+ if odd? n then y := chop( times(y,z), bits() )
+ if (n := n quo 2) = 0 then return y
+ z := chop( times(z,z), bits() )
+
+ x:% ** y:% ==
+ x = 0 =>
+ y = 0 => error "0**0 is undefined"
+ y < 0 => error "division by 0"
+ y > 0 => 0
+ y = 0 => 1
+ y = 1 => x
+ x = 1 => 1
+ p := abs order y + 5
+ inc p; r := exp(y*log(x)); dec p
+ normalize r
+
+ x:% ** r:RN ==
+ x = 0 =>
+ r = 0 => error "0**0 is undefined"
+ r < 0 => error "division by 0"
+ r > 0 => 0
+ r = 0 => 1
+ r = 1 => x
+ x = 1 => 1
+ n := numer r
+ d := denom r
+ negative? x =>
+ odd? d =>
+ odd? n => return -((-x)**r)
+ return ((-x)**r)
+ error "negative root"
+ if d = 2 then
+ inc LENGTH n; y := sqrt(x); y := y**n; dec LENGTH n
+ return normalize y
+ y := [n,0]/[d,0]
+ x ** y
+
+ x:% ** n:I ==
+ x = 0 =>
+ n = 0 => error "0**0 is undefined"
+ n < 0 => error "division by 0"
+ n > 0 => 0
+ n = 0 => 1
+ n = 1 => x
+ x = 1 => 1
+ p := bits()
+ bits(p + LENGTH n + 2)
+ y := power(x,abs n)
+ if n < 0 then y := dvide(1,y)
+ bits p
+ normalize y
+
+ -- Utility routines for conversion to decimal
+ ceilLength10: I -> I
+ chop10: (%,I) -> %
+ convert10:(%,I) -> %
+ floorLength10: I -> I
+ length10: I -> I
+ normalize10: (%,I) -> %
+ quotient10: (%,%,I) -> %
+ power10: (%,I,I) -> %
+ times10: (%,%,I) -> %
+
+ convert10(x,d) ==
+ m := x.mantissa; e := x.exponent
+ --!! deal with bits here
+ b := bits(); (q,r) := divide(abs e, b)
+ b := 2**b::N; r := 2**r::N
+ -- compute 2**e = b**q * r
+ h := power10([b,0],q,d+5)
+ h := chop10([r*h.mantissa,h.exponent],d+5)
+ if e < 0 then h := quotient10([m,0],h,d)
+ else times10([m,0],h,d)
+
+ ceilLength10 n == 146 * LENGTH n quo 485 + 1
+ floorLength10 n == 643 * LENGTH n quo 2136
+-- length10 n == DECIMAL_-LENGTH(n)$Lisp
+ length10 n ==
+ ln := LENGTH(n:=abs n)
+ upper := 76573 * ln quo 254370
+ lower := 21306 * (ln-1) quo 70777
+ upper = lower => upper + 1
+ n := n quo (10**lower::N)
+ while n >= 10 repeat
+ n:= n quo 10
+ lower := lower + 1
+ lower + 1
+
+ chop10(x,p) ==
+ e : I := floorLength10 x.mantissa - p
+ if e > 0 then x := [x.mantissa quo 10**e::N,x.exponent+e]
+ x
+ normalize10(x,p) ==
+ ma := x.mantissa
+ ex := x.exponent
+ e : I := length10 ma - p
+ if e > 0 then
+ ma := ma quo 10**(e-1)::N
+ ex := ex + e
+ (ma,r) := divide(ma, 10)
+ if r > 4 then
+ ma := ma + 1
+ if ma = 10**p::N then (ma := 1; ex := ex + p)
+ [ma,ex]
+ times10(x,y,p) == normalize10(times(x,y),p)
+ quotient10(x,y,p) ==
+ ew := floorLength10 y.mantissa - ceilLength10 x.mantissa + p + 2
+ if ew < 0 then ew := 0
+ mw := (x.mantissa * 10**ew::N) quo y.mantissa
+ ew := x.exponent - y.exponent - ew
+ normalize10([mw,ew],p)
+ power10(x,n,d) ==
+ x = 0 => 0
+ n = 0 => 1
+ n = 1 => x
+ x = 1 => 1
+ p:I := d + LENGTH n + 1
+ e:I := n
+ y:% := 1
+ z:% := x
+ repeat
+ if odd? e then y := chop10(times(y,z),p)
+ if (e := e quo 2) = 0 then return y
+ z := chop10(times(z,z),p)
+
+ --------------------------------
+ -- Output routines for Floats --
+ --------------------------------
+ zero ==> char("0")
+ separator ==> space()$Character
+
+ SPACING : Reference(N) := ref 10
+ OUTMODE : Reference(S) := ref "general"
+ OUTPREC : Reference(I) := ref(-1)
+
+ fixed : % -> S
+ floating : % -> S
+ general : % -> S
+
+ padFromLeft(s:S):S ==
+ zero? SPACING() => s
+ n:I := #s - 1
+ t := new( (n + 1 + n quo SPACING()) :: N , separator )
+ for i in 0..n for j in minIndex t .. repeat
+ t.j := s.(i + minIndex s)
+ if (i+1) rem SPACING() = 0 then j := j+1
+ t
+ padFromRight(s:S):S ==
+ SPACING() = 0 => s
+ n:I := #s - 1
+ t := new( (n + 1 + n quo SPACING()) :: N , separator )
+ for i in n..0 by -1 for j in maxIndex t .. by -1 repeat
+ t.j := s.(i + minIndex s)
+ if (n-i+1) rem SPACING() = 0 then j := j-1
+ t
+
+ fixed f ==
+ zero? f => "0.0"
+ zero? exponent f =>
+ padFromRight concat(convert(mantissa f)@S, ".0")
+ negative? f => concat("-", fixed abs f)
+ d := if OUTPREC() = -1 then digits::I else OUTPREC()
+-- g := convert10(abs f,digits); m := g.mantissa; e := g.exponent
+ g := convert10(abs f,d); m := g.mantissa; e := g.exponent
+ if OUTPREC() ^= -1 then
+ -- round g to OUTPREC digits after the decimal point
+ l := length10 m
+ if -e > OUTPREC() and -e < 2*digits::I then
+ g := normalize10(g,l+e+OUTPREC())
+ m := g.mantissa; e := g.exponent
+ s := convert(m)@S; n := #s; o := e+n
+ p := if OUTPREC() = -1 then n::I else OUTPREC()
+ t:S
+ if e >= 0 then
+ s := concat(s, new(e::N, zero))
+ t := ""
+ else if o <= 0 then
+ t := concat(new((-o)::N,zero), s)
+ s := "0"
+ else
+ t := s(o + minIndex s .. n + minIndex s - 1)
+ s := s(minIndex s .. o + minIndex s - 1)
+ n := #t
+ if OUTPREC() = -1 then
+ t := rightTrim(t,zero)
+ if t = "" then t := "0"
+ else if n > p then t := t(minIndex t .. p + minIndex t- 1)
+ else t := concat(t, new((p-n)::N,zero))
+ concat(padFromRight s, concat(".", padFromLeft t))
+
+ floating f ==
+ zero? f => "0.0"
+ negative? f => concat("-", floating abs f)
+ t:S := if zero? SPACING() then "E" else " E "
+ zero? exponent f =>
+ s := convert(mantissa f)@S
+ concat ["0.", padFromLeft s, t, convert(#s)@S]
+ -- base conversion to decimal rounded to the requested precision
+ d := if OUTPREC() = -1 then digits::I else OUTPREC()
+ g := convert10(f,d); m := g.mantissa; e := g.exponent
+ -- I'm assuming that length10 m = # s given n > 0
+ s := convert(m)@S; n := #s; o := e+n
+ s := padFromLeft s
+ concat ["0.", s, t, convert(o)@S]
+
+ general(f) ==
+ zero? f => "0.0"
+ negative? f => concat("-", general abs f)
+ d := if OUTPREC() = -1 then digits::I else OUTPREC()
+ zero? exponent f =>
+ d := d + 1
+ s := convert(mantissa f)@S
+ OUTPREC() ^= -1 and (e := #s) > d =>
+ t:S := if zero? SPACING() then "E" else " E "
+ concat ["0.", padFromLeft s, t, convert(e)@S]
+ padFromRight concat(s, ".0")
+ -- base conversion to decimal rounded to the requested precision
+ g := convert10(f,d); m := g.mantissa; e := g.exponent
+ -- I'm assuming that length10 m = # s given n > 0
+ s := convert(m)@S; n := #s; o := n + e
+ -- Note: at least one digit is displayed after the decimal point
+ -- and trailing zeroes after the decimal point are dropped
+ if o > 0 and o <= max(n,d) then
+ -- fixed format: add trailing zeroes before the decimal point
+ if o > n then s := concat(s, new((o-n)::N,zero))
+ t := rightTrim(s(o + minIndex s .. n + minIndex s - 1), zero)
+ if t = "" then t := "0" else t := padFromLeft t
+ s := padFromRight s(minIndex s .. o + minIndex s - 1)
+ concat(s, concat(".", t))
+ else if o <= 0 and o >= -5 then
+ -- fixed format: up to 5 leading zeroes after the decimal point
+ concat("0.",padFromLeft concat(new((-o)::N,zero),rightTrim(s,zero)))
+ else
+ -- print using E format written 0.mantissa E exponent
+ t := padFromLeft rightTrim(s,zero)
+ s := if zero? SPACING() then "E" else " E "
+ concat ["0.", t, s, convert(e+n)@S]
+
+ outputSpacing n == SPACING() := n
+ outputFixed() == (OUTMODE() := "fixed"; OUTPREC() := -1)
+ outputFixed n == (OUTMODE() := "fixed"; OUTPREC() := n::I)
+ outputGeneral() == (OUTMODE() := "general"; OUTPREC() := -1)
+ outputGeneral n == (OUTMODE() := "general"; OUTPREC() := n::I)
+ outputFloating() == (OUTMODE() := "floating"; OUTPREC() := -1)
+ outputFloating n == (OUTMODE() := "floating"; OUTPREC() := n::I)
+
+ convert(f):S ==
+ b:Integer :=
+ OUTPREC() = -1 and not zero? f =>
+ bits(length(abs mantissa f)::PositiveInteger)
+ 0
+ s :=
+ OUTMODE() = "fixed" => fixed f
+ OUTMODE() = "floating" => floating f
+ OUTMODE() = "general" => general f
+ empty()$String
+ if b > 0 then bits(b::PositiveInteger)
+ s = empty()$String => error "bad output mode"
+ s
+
+ coerce(f):OutputForm ==
+ f >= 0 => message(convert(f)@S)
+ - (coerce(-f)@OutputForm)
+
+ convert(f):InputForm ==
+ convert [convert("float"::Symbol), convert mantissa f,
+ convert exponent f, convert base()]$List(InputForm)
+
+ -- Conversion routines
+ convert(x:%):Float == x pretend Float
+ convert(x:%):SF == makeSF(x.mantissa,x.exponent)$Lisp
+ coerce(x:%):SF == convert(x)@SF
+ convert(sf:SF):% == float(mantissa sf,exponent sf,base()$SF)
+
+ retract(f:%):RN == rationalApproximation(f,(bits()-1)::N,BASE)
+
+ retractIfCan(f:%):Union(RN, "failed") ==
+ rationalApproximation(f,(bits()-1)::N,BASE)
+
+ retract(f:%):I ==
+ (f = (n := wholePart f)::%) => n
+ error "Not an integer"
+
+ retractIfCan(f:%):Union(I, "failed") ==
+ (f = (n := wholePart f)::%) => n
+ "failed"
+
+ rationalApproximation(f,d) == rationalApproximation(f,d,10)
+
+ rationalApproximation(f,d,b) ==
+ t: Integer
+ nu := f.mantissa; ex := f.exponent
+ if ex >= 0 then return ((nu*BASE**(ex::N))/1)
+ de := BASE**((-ex)::N)
+ if b < 2 then error "base must be > 1"
+ tol := b**d
+ s := nu; t := de
+ p0,p1,q0,q1 : Integer
+ p0 := 0; p1 := 1; q0 := 1; q1 := 0
+ repeat
+ (q,r) := divide(s, t)
+ p2 := q*p1+p0
+ q2 := q*q1+q0
+ if r = 0 or tol*abs(nu*q2-de*p2) < de*abs(p2) then return (p2/q2)
+ (p0,p1) := (p1,p2)
+ (q0,q1) := (q1,q2)
+ (s,t) := (t,r)
+
+@
+
+--% Float: arbitrary precision floating point arithmetic domain
+
+\section{License}
+<<license>>=
+--Copyright (c) 1991-2002, The Numerical ALgorithms Group Ltd.
+--All rights reserved.
+--
+--Redistribution and use in source and binary forms, with or without
+--modification, are permitted provided that the following conditions are
+--met:
+--
+-- - Redistributions of source code must retain the above copyright
+-- notice, this list of conditions and the following disclaimer.
+--
+-- - Redistributions in binary form must reproduce the above copyright
+-- notice, this list of conditions and the following disclaimer in
+-- the documentation and/or other materials provided with the
+-- distribution.
+--
+-- - Neither the name of The Numerical ALgorithms Group Ltd. nor the
+-- names of its contributors may be used to endorse or promote products
+-- derived from this software without specific prior written permission.
+--
+--THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
+--IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+--TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+--PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
+--OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+--EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+--PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+--PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+--LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+--NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+--SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+@
+<<*>>=
+<<license>>
+
+<<domain FLOAT Float>>
+@
+\eject
+\begin{thebibliography}{99}
+\bibitem{1} nothing
+\end{thebibliography}
+\end{document}