Real-valued numbers are represented on a computer with finite-precision
binary representation (usually 32 or 64 bits). Floating-point
representation is the most common, and is able to express a huge range
of values by devoting some of those bits to a value's most significant
digits (the *mantissa*) and others to its absolute magnitude (the
*exponent*). Of course, there are many numbers, like 1/3 or $\pi$, whose
representations are infinite in base 10 form (0.3333... infinitely
repeating or 3.14159... with no repetitions) as well as in binary form.
The use of finite precision to approximate these values causes a small
but sometimes troublesome error between the "true" value and its
representation on a computer. This is known as *truncation error* and is
a form of *numerical error*.

Other forms of numerical error occur when a computer performs certain
mathematical operations on fixed-precision numbers. Truncation error is
again an issue. For example, when adding a very small number to a very
large number, the least significant digit of the small number will be
overwhelmed by the significant digits of the large one. In 64-bit
arithmetic with Python 2.7, you may be surprised to see that although
1.0/3.0 evaluates to 0.3333333333333333, the result of 10.0/3.0 is
3.333333333333333**5**! Hence, the result of (1.0/3.0)*10.0 - 10.0/3.0
is not identically 0, but some small number.

In [1]:

```
from __future__ import print_function # needed for Python 2 / 3 compatibility
print(repr(1.0/3.0)) #we use repr to show all digits, because plain print truncates the output
print(repr(10.0/3.0))
print((1.0/3.0)*10.0 - 10.0/3.0)
```

A more significant type of error occurs when truncation errors *grow* as
a result of compounding operations. The following three operations are
the most common sources of compounding numerical error:

Dividing by a small number.

Subtracting two nearly-equal numbers.

Evaluating a mathematical function with argument(s) near the boundary of the domain on which it is defined.

Other sources are overflow and underflow, in which the result of an operation exceed the maximum and minimum number possibly represented, respectively.

A mathematical subroutine that is designed to prevent compounding
numerical error is called *numerically stable* (or *robust*). As an
illustration, consider solving for $x$ in the quadratic equation
$$a x^2 + b x + c = 0.$$ As you recall from algebra, the solutions to
this equation are:
$$x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}
\label{eq:QuadraticEquation}$$
if $a \neq 0$ and $b^2 - 4ac \geq 0$, and
if $a=0$ it is $x = -c/b$. If $b^2 - 4ac < 0$ no solutions exist.

The main issue to address is the division by $2a$ (and similarly, the test for $a=0$) if $a$ is small. When $a$ has a small absolute value, both the numerator and the denominator of ($\ref{eq:QuadraticEquation}$) will be close to 0, causing a direct evaluation of ($\ref{eq:QuadraticEquation}$) to suffer from large numerical errors, as shown below.

In [5]:

```
from __future__ import division #Needed for Python 2/3 compatibility, in case you give integers to quad
import math
def quad1(a,b,c):
"""Returns a tuple of 0, 1, or 2 solutions to the quadratic equation a^2 x + b x + c = 0"""
if a == 0:
return (-c/b,)
det = b**2 - 4*a*c
if det < 0:
return ()
if det == 0:
return (-b/(2*a),)
sqrtdet = math.sqrt(det)
return ((-b-sqrtdet)/(2*a),(-b+sqrtdet)/(2*a))
def test(a,b,c,fn):
xs = fn(a,b,c)
print(" ",xs)
print(" errors",[a*x**2 + b*x + c for x in xs])
print("Solutions to")
print("x^2-3x-4=0:")
test(1,-3,-4,quad1)
print("x^2-2x+1=0:")
test(1,-2,1,quad1)
print("x^2+x+1=0:")
test(1,1,1,quad1)
print()
print("Now, some very similar equations")
print("x+1=0:")
test(0,1,1,quad1)
#uh oh, numerical errors
print("10^-15*x^2+x+1=0:")
test(1e-15,1,1,quad1)
#...are getting worse
print("10^-17*x^2+x+1=0:")
test(1e-17,1,1,quad1)
```

However, if we cleverly rearrange ($\ref{eq:QuadraticEquation}$), we obtain a more numerically stable expression. Consider multiplying both the numerator and the denominator by $-b \mp \sqrt{b^2-4ac}$: $$x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} \cdot \frac{-b \mp \sqrt{b^2-4ac}}{-b \mp \sqrt{b^2-4ac}}$$ which evaluates to $$x = \frac{2c}{ -b \mp \sqrt{b^2-4ac} }. \label{eq:QuadraticEquation2} $$ Observe that the denominator has two possibilities due to sign choice, and one ($-b + \sqrt{b^2-4ac}$) will be close to zero because $a$ is small, leading the square root term to be close to $\sqrt{b^2}$. The other possible denominator, however, will be well conditioned. Hence, we can choose the well conditioned solution from ($\ref{eq:QuadraticEquation2}$), and the other well conditioned solution from ($\ref{eq:QuadraticEquation}$). The following code does just that.

In [7]:

```
def quad2(a,b,c):
"""Numerically stable version of quad1.
Returns a tuple of 0, 1, or 2 solutions to the quadratic equation a^2 x + b x + c = 0"""
if a == 0:
return (-c/b,)
det = b**2 - 4*a*c
if det < 0:
return ()
if det == 0:
if abs(a) > abs(b):
#version 1
return (-b/(2*a),)
else:
#version 2
return (-2*c/b,)
sqrtdet = math.sqrt(det)
signb = 1 if b >= 0 else -1
num = -b-signb*sqrtdet
return (num/(2*a),2*c/num)
def test(a,b,c,fn):
xs = fn(a,b,c)
print(" ",xs)
print(" errors",[a*x**2 + b*x + c for x in xs])
print("Solutions to")
print("x^2-3x-4=0:")
test(1,-3,-4,quad2)
print("x^2-2x+1=0:")
test(1,-2,1,quad2)
print("x^2+x+1=0:")
test(1,1,1,quad2)
print()
print("Now, some very similar equations")
print("x+1=0:")
test(0,1,1,quad2)
#numerical errors are better
print("10^-15*x^2+x+1=0:")
test(1e-15,1,1,quad2)
#and at least are correct for the solution that's not so huge
print("10^-17*x^2+x+1=0:")
test(1e-17,1,1,quad2)
```

In [ ]:

```
```