An $n$-dimensional *vector* $\mathbf{x}$ is a tuple of real numbers
$\mathbf{x}=(x_1,\ldots,x_n) \in \mathbb{R}^{n}$. In later portions of this course,
we will usually drop the boldface.

Vectors can be added and subtracted component-wise, can be multipled by
elements of $\mathbb{R}$, and can be divided by elements of
$\mathbb{R} \setminus \{0\}$. There is a zero vector $\mathbf{0}$ with all
elements equal to zero. Each element has a component-wise negation,
$-\mathbf{x}$. We will also on occasion refer to the standard *basis vectors*
$\mathbf{e}_{1},\ldots,\mathbf{e}_{n}$, where $\mathbf{e}_{i}$ has all elements equal to 0
except for the $i$'th element, which is equal to 1. More commonly, we
refer to $\mathbf{e}_{1}$ as the x-axis, $\mathbf{e}_{2}$ as the y-axis, and so on.

Dot product. The dot product is a function that takes two vectors $\mathbf{x}=(x_1,\ldots,x_n)$ and $\mathbf{y}=(y_1,\ldots,y_n)$ and returns a real number, and is given by the expression $$\mathbf{x} \cdot \mathbf{y} = \sum_{i=1}^n x_i y_i.$$

For example, $\mathbf{x} \cdot \mathbf{e}_{i} = x_i$.

Orthogonal vectors. Two vectors whose dot product is identically zero are calledorthogonal.

For example, $\mathbf{e}_{i} \cdot \mathbf{e}_{j} = 0$ for any pair $i\neq j$, and so $\mathbf{e}_{i}$ and $\mathbf{e}_{j}$ are orthogonal. Also, $\begin{bmatrix}1 \\ 2 \end{bmatrix} \cdot \begin{bmatrix} -6 \\ 3 \end{bmatrix} = 0$, so these vectors are orthogonal as well.

Norms. A norm describes some notion of vector magnitude. The standard Euclidean norm $\|\cdot\|$ is defined as $\|\mathbf{x}\|=\sqrt{\mathbf{x} \cdot \mathbf{x}}$.

Some identities are $\|\mathbf{0}\|=0$, $\|\mathbf{x}\| > 0$ if $\mathbf{x} \neq \mathbf{0}$, and $\|c\mathbf{x}\| = |c|\,\|\mathbf{x}\|$ for all real values $c$.

Unit vectors. A unit vector is a vector with unit norm: $\|\mathbf{x}\|=1$.

Euclidean distance. The euclidean distance between two vectors is given by the norm of the difference between the vectors: $d(\mathbf{x},\mathbf{y}) = \|\mathbf{x}-\mathbf{y}\|$.

It satisfies all the criteria of a metric, and hence vector spaces are metric spaces.

Cosine angle formula. The cosine of the angle between two unit vectors is equal to their dot product, i.e. $\cos(\theta) = \mathbf{a} \cdot \mathbf{b}$.

Linear Combinations. $\mathbf{x}$ is alinear combinationof a set of vectors $\{\mathbf{a}_{1},\ldots,\mathbf{a}_{m}\}$ if $$\mathbf{x} = \sum_{i=1}^m u_i \mathbf{a}_{i}$$ for some set of numbers $u_1,\ldots,u_m$.

A matrix $A$ represents a linear transformation of an $n$-dimensional vector space to an $m$-dimensional one. It is given by an $m\times n$ array of real numbers. Usually matrices are denoted as uppercase letters (e.g., $A, B, C$), with the entry in the $i$'th row and $j$'th column denoted in the subscript $\cdot_{i,j}$, or when it is unambiguous, $\cdot_{ij}$ (e.g., $A_{1,2}, A_{12}$). $$A = \left[ \begin{array}{ccc} A_{1,1} & \cdots & A_{1,n} \\ \dots & & \dots \\ A_{m,1} & \cdots & A_{m,n} \end{array}\right]$$

An $m\times n$ matrix $A$ transforms vectors $\mathbf{x} =(x_1,\ldots,x_n)$
into $m$-dimensional vectors $\mathbf{y} = (y_1,\ldots,y_m) = A\mathbf{x}$ as
follows: $$\begin{split}
y_1 = \sum_{j=1}^n A_{1j} x_j \\
\ldots \\
y_m = \sum_{j=1}^n A_{mj} x_j \\
\end{split}$$ Or, more concisely, $y_i = \sum_{j=1}^n A_{ij} x_j$ for
$i=1,\ldots,m$. (Note that matrix-vector multiplication *is not
symmetric*, so $\mathbf{x} A$ is an invalid operation.)

We can see that matrix-vector multiplication is linear, that is $A(a\mathbf{x}+b\mathbf{y}) = a A \mathbf{x} + b A \mathbf{y}$ for all $a$, $b$, $\mathbf{x}$, and $\mathbf{y}$. It is also linear in terms of component-wise addition and multiplication of matrices, as long as the matrices are of the same size. More precisely, if $A$ and $B$ are both $m\times n$ matrices, then $(a A + b B)\mathbf{x} = a A\mathbf{x} + b B\mathbf{x}$ for all $a$, $b$, and $\mathbf{x}$.

One special matrix that occurs frequently is the $n\times n$ *identity
matrix* $I_n$, which has 0's in all off-diagonal positions $I_{ij}$ with
$i\neq j$, and 1's in all diagonal positions $I_{ii}$. It is significant
because $I_n \mathbf{x} = \mathbf{x}$ for all $\mathbf{x} \in \mathbb{R}^n$.

When two linear transformations are performed one after the other, the
result is also a linear transformation. Suppose $A$ is $m\times n$, $B$
is $n \times p$, and $\mathbf{x}$ is a $p$-dimensional vector, and consider
the result of $A(B\mathbf{x})$ (that is, first multiplying by $B$ and then
multiplying the result by $A$). We see that
$$B\mathbf{x} = (\sum_{j=1}^p B_{1j} x_j,\ldots,\sum_{j=1}^p B_{nj} x_j)$$
and
$$A \mathbf{y} = (\sum_{k=1}^n A_{1k} y_k,\ldots,\sum_{k=1}^n A_{mk} y_k)$$
So
$$A (B \mathbf{x}) = \left(\sum_{k=1}^n A_{1k} (\sum_{j=1}^p B_{kj} x_j),\ldots,\sum_{k=1}^n A_{mk} (\sum_{j=1}^p B_{kj} x_j)\right).$$
Rearranging the summations, we see that
$$A (B \mathbf{x}) = \left(\sum_{j=1}^p (\sum_{k=1}^n A_{1k} B_{kj}) x_j),\ldots,\sum_{j=1}^p (\sum_{k=1}^n A_{mk} B_{kj} x_j)\right).$$
In other words, we could have $A(B\mathbf{x}) = C \mathbf{x}$ if we were to form a
matrix $C$ such that $$C_{ij} = \sum_{k=1}^n A_{ik} B_{kj}$$ This is
exactly the definition of the *matrix product*, and we say $C=AB$. The
entry $C_{ij}$ of can also be obtained taking the dot-product of the
$i$'th column of $A$ and the $j$'th column of $B$.

By the above derivation we can drop the parentheses
$A(B\mathbf{x}) = (AB)\mathbf{x}$. So, matrix-vector and matrix-matrix
multiplication are *associative*. Note again however that matrix-matrix
multiplication *is not symmetric*, that is $AB \neq BA$ in general.

Note that if we were to write an $n$-dimensional vector $\mathbf{x}$ stacked in a $n\times 1$ matrix $x$ (denoted in lowercase), we can turn the matrix-vector $\mathbf{y}=A\mathbf{x}$ into the matrix product $y = A x$. Here, if $A$ is an $m\times n$ matrix, then $y$ is an $m\times 1$ matrix. $$\left[\begin{array}{c} y_1 \\ \dots \\ y_m \end{array}\right] = \left[ \begin{array}{ccc} A_{1,1} & \cdots & A_{1,n} \\ \dots & & \dots \\ A_{m,1} & \cdots & A_{m,n} \end{array}\right] \left[\begin{array}{c} x_1 \\ \dots \\ x_n \end{array}\right]$$ Hence, there is a one-to-one correspondence between vectors and matrices with one column. These matrices are called column vectors and will be our default notation for vectors throughout the rest of the course. We will occasionally also deal with row vectors, which are matrices with a single row.

The transpose $A^T$ of a matrix $A$ simply switches $A$'s rows and columns. $$(A^T)_{ij} = A_{ji}.$$ If $A$ is $m \times n$, then $A^T$ is $n \times m$.

Symmetric matrix. A square matrix $A$ is symmetric iff $A = A^T$.

An *inverse* $A^{-1}$ of an $n\times n$ square matrix $A$ is a matrix
that satisfies the following equation: $$A A^{-1} = A^{-1} A = I_n$$
where $I_n$ is the identity matrix. Not all square matrices have an
inverse, in which case we say $A$ is not invertible (or singular).
Invertible matrices are significant because the unique solution $x$ to
the system of linear equations $Ax = b$, is simply $A^{-1} b$. This
holds for any $b$. If the matrix is not invertible, then such an
equation may or may not have a solution.

Orthogonal matrix. An orthogonal matrix is a square matrix that satisfies $A A^T = I_n$. In other words, its transpose is its inverse.

Identities involving the transpose:

$(cA)^T = c A^T$ for any real value $c$.

$(A+B)^T = A^T + B^T$.

$(AB)^T = B^T A^T$.

All $1\times 1$ matrices are symmetric, the identity matrix is symmetric, and all uniform scalings of a symmetric matrix are symmetric.

$A + A^T$ is symmetric.

The dot product $\mathbf{x} \cdot \mathbf{y}$ is equal to $x^T y$, with $x$ and $y$ denoting the column vector representations of $\mathbf{x}$ and $\mathbf{y}$, respectively.

$x^T A y = y^T A^T x$, with $x$ and $y$ column vectors.

Identities involving the inverse:

$I_n^{-1} = I_n$.

$(cA)^{-1} = \frac{1}{c} A^{-1}$ for any real value $c\neq 0$.

$(AB)^{-1} = B^{-1}A^{-1}$ if both $B$ and $A$ are invertible.

If $A$ and $B$ are invertible, then $(ABA^{-1})^{-1} = A B^{-1}A^{-1}$.

The pseudoinverse is a generalization of the inverse of an $m \times n$ matrix $A$ that is used when an inverse does not exist. It can also be used when a matrix is not square. The pseudoinverse $A^+$ is defined as an $n \times m$ matrix that has the following properties:

$A A^+ A = A$

$A^+ A A^+ = A^+$

$(A A^+)^T = A A^+$

$(A^+ A)^T = A^+ A$

It has the following properties:

If $A$ is invertible, then $A^+ = A^{-1}$.

If multiple solutions to $A x = b$ exist, then $x = A^+ b$ is the solution that minimizes $\| x \|$.

If no solutions to $A x = b$ exist, then $x = A^+ b$ is the solution that minimizes $\| A x - b \|$ (least squares solution).

The pseudoinverse is usually available in most major linear algebra systems. It can be computed using the singular value decomposition (SVD), which is itself one of the most useful tools in scientific computing (see Appendix B.2 ).

An $n\times n$ matrix $A$ is called *symmetric positive definite*
(s.p.d.), or just positive definite, if it is symmetric ($A=A^T$) and
satisfies the following condition:
$$\mathbf{x}^T A \mathbf{x} > 0 \text{ for all }\mathbf{x}\in \mathbb{R}^n.$$
Some useful properties include:

- An identity matrix is s.p.d.
- A scaling $A = cB$ of a s.p.d. matrix $B$ is s.p.d. iff $c > 0$.
- Any matrix $A = B^T B$, with $B$ a matrix with rank $n$, is s.p.d.
- A symmetric matrix with eigenvalues $\lambda_1,\ldots,\lambda_n$ is symmetric positive definite if all $\lambda_i > 0, \forall i=1,...,n$.
- An s.p.d. matrix is invertible.

A matrix is positive semi-definite (p.s.d.) if the strict positivity condition is replaced with a nonnegativity condition: $$\mathbf{x}^T A \mathbf{x} \geq 0 \text{ for all }\mathbf{x}\in \mathbb{R}^n.$$

- Any s.p.d. matrix is also p.s.d.
- Any matrix $A=B^TB$ is also p.s.d.
- A symmetric matrix with eigenvalues $\lambda_1,\ldots,\lambda_n$ is symmetric positive definite if all $\lambda_i \geq 0, \forall i=1,...,n$.

In [ ]:

```
```