# B.1. Numerical Errors¶

Real-valued numbers are represented on a computer with finite-precision binary representation (usually 32 or 64 bits). Floating-point representation is the most common, and is able to express a huge range of values by devoting some of those bits to a value's most significant digits (the mantissa) and others to its absolute magnitude (the exponent). Of course, there are many numbers, like 1/3 or $\pi$, whose representations are infinite in base 10 form (0.3333... infinitely repeating or 3.14159... with no repetitions) as well as in binary form. The use of finite precision to approximate these values causes a small but sometimes troublesome error between the "true" value and its representation on a computer. This is known as truncation error and is a form of numerical error.

Other forms of numerical error occur when a computer performs certain mathematical operations on fixed-precision numbers. Truncation error is again an issue. For example, when adding a very small number to a very large number, the least significant digit of the small number will be overwhelmed by the significant digits of the large one. In 64-bit arithmetic with Python 2.7, you may be surprised to see that although 1.0/3.0 evaluates to 0.3333333333333333, the result of 10.0/3.0 is 3.3333333333333335! Hence, the result of (1.0/3.0)*10.0 - 10.0/3.0 is not identically 0, but some small number.

In [1]:
from __future__ import print_function # needed for Python 2 / 3 compatibility

print(repr(1.0/3.0))   #we use repr to show all digits, because plain print truncates the output
print(repr(10.0/3.0))
print((1.0/3.0)*10.0 - 10.0/3.0)

0.3333333333333333
3.3333333333333335
-4.440892098500626e-16


A more significant type of error occurs when truncation errors grow as a result of compounding operations. The following three operations are the most common sources of compounding numerical error:

1. Dividing by a small number.

2. Subtracting two nearly-equal numbers.

3. Evaluating a mathematical function with argument(s) near the boundary of the domain on which it is defined.

Other sources are overflow and underflow, in which the result of an operation exceed the maximum and minimum number possibly represented, respectively.

A mathematical subroutine that is designed to prevent compounding numerical error is called numerically stable (or robust). As an illustration, consider solving for $x$ in the quadratic equation $$a x^2 + b x + c = 0.$$ As you recall from algebra, the solutions to this equation are: $$x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} \label{eq:QuadraticEquation}$$ if $a \neq 0$ and $b^2 - 4ac \geq 0$, and if $a=0$ it is $x = -c/b$. If $b^2 - 4ac < 0$ no solutions exist.

The main issue to address is the division by $2a$ (and similarly, the test for $a=0$) if $a$ is small. When $a$ has a small absolute value, both the numerator and the denominator of ($\ref{eq:QuadraticEquation}$) will be close to 0, causing a direct evaluation of ($\ref{eq:QuadraticEquation}$) to suffer from large numerical errors, as shown below.

In [5]:
from __future__ import division  #Needed for Python 2/3 compatibility, in case you give integers to quad
import math

"""Returns a tuple of 0, 1, or 2 solutions to the quadratic equation a^2 x + b x + c = 0"""
if a == 0:
return (-c/b,)
det = b**2 - 4*a*c
if det < 0:
return ()
if det == 0:
return (-b/(2*a),)
sqrtdet = math.sqrt(det)
return ((-b-sqrtdet)/(2*a),(-b+sqrtdet)/(2*a))

def test(a,b,c,fn):
xs = fn(a,b,c)
print("  ",xs)
print("  errors",[a*x**2 + b*x + c for x in xs])

print("Solutions to")
print("x^2-3x-4=0:")
print("x^2-2x+1=0:")
print("x^2+x+1=0:")
print()
print("Now, some very similar equations")
print("x+1=0:")
#uh oh, numerical errors
print("10^-15*x^2+x+1=0:")
#...are getting worse
print("10^-17*x^2+x+1=0:")

Solutions to
x^2-3x-4=0:
(-1.0, 4.0)
errors [0.0, 0.0]
x^2-2x+1=0:
(1.0,)
errors [0.0]
x^2+x+1=0:
()
errors []

Now, some very similar equations
x+1=0:
(-1.0,)
errors [0.0]
10^-15*x^2+x+1=0:
(-999999999999998.9, -0.9992007221626408)
errors [0.0, 0.0007992778373602238]
10^-17*x^2+x+1=0:
(-1e+17, 0.0)
errors [1.0, 1.0]


However, if we cleverly rearrange ($\ref{eq:QuadraticEquation}$), we obtain a more numerically stable expression. Consider multiplying both the numerator and the denominator by $-b \mp \sqrt{b^2-4ac}$: $$x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} \cdot \frac{-b \mp \sqrt{b^2-4ac}}{-b \mp \sqrt{b^2-4ac}}$$ which evaluates to $$x = \frac{2c}{ -b \mp \sqrt{b^2-4ac} }. \label{eq:QuadraticEquation2}$$ Observe that the denominator has two possibilities due to sign choice, and one ($-b + \sqrt{b^2-4ac}$) will be close to zero because $a$ is small, leading the square root term to be close to $\sqrt{b^2}$. The other possible denominator, however, will be well conditioned. Hence, we can choose the well conditioned solution from ($\ref{eq:QuadraticEquation2}$), and the other well conditioned solution from ($\ref{eq:QuadraticEquation}$). The following code does just that.

In [7]:
def quad2(a,b,c):
Returns a tuple of 0, 1, or 2 solutions to the quadratic equation a^2 x + b x + c = 0"""
if a == 0:
return (-c/b,)
det = b**2 - 4*a*c
if det < 0:
return ()
if det == 0:
if abs(a) > abs(b):
#version 1
return (-b/(2*a),)
else:
#version 2
return (-2*c/b,)
sqrtdet = math.sqrt(det)
signb = 1 if b >= 0 else -1
num = -b-signb*sqrtdet
return (num/(2*a),2*c/num)

def test(a,b,c,fn):
xs = fn(a,b,c)
print("  ",xs)
print("  errors",[a*x**2 + b*x + c for x in xs])

print("Solutions to")
print("x^2-3x-4=0:")
print("x^2-2x+1=0:")
print("x^2+x+1=0:")
print()
print("Now, some very similar equations")
print("x+1=0:")
#numerical errors are better
print("10^-15*x^2+x+1=0:")
#and at least are correct for the solution that's not so huge
print("10^-17*x^2+x+1=0:")

Solutions to
x^2-3x-4=0:
(4.0, -1.0)
errors [0.0, 0.0]
x^2-2x+1=0:
(1.0,)
errors [0.0]
x^2+x+1=0:
()
errors []

Now, some very similar equations
x+1=0:
(-1.0,)
errors [0.0]
10^-15*x^2+x+1=0:
(-999999999999998.9, -1.000000000000001)
errors [0.0, 0.0]
10^-17*x^2+x+1=0:
(-1e+17, -1.0)
errors [1.0, 0.0]

In [ ]: