A fundamental concept in computer science is the *graph* (aka, network).
This is a structure that defines a set of objects and connectivity
relations between pairs of objects. The objects are known as *vertices*
(or sometimes *nodes*), and the relations are known as *edges* (or
sometimes *arcs*).

A graph $G = (V,E)$ consists of a set of vertices $V$ and a set of edges
$E$, in which each edge is a pair $(u,v)$ with $u\in V$ and $v \in V$.
For every such pair $(u,v)$, $v$ is called a *neighbor* of $u$. We can
also define the *neighbors* of a vertex as
$N(u) = \{ v \in V \quad |\quad (u,v)\in E\}$.

Graphs can either be *directed* in which the existence of an edge
$(u,v)$ indicates that a path can move from $u$ to $v$, or *undirected*
in which paths can run either direction along an edge. For the
undirected case, we shall simply require that if $(u,v) \in E$ then
$(v,u) \in E$ as well.

For each edge $(u,v)$ we say that $u$ is the *tail* of the edge and $v$
is its *head*. A sequence of connected edges
$(v_0,v_1),(v_1,v_2),\ldots,(v_{k-1},v_k)$ with each
$(v_{i-1},v_i)\in E$ is known as a *path* in the graph. If a path exists
between two vertices in an undirected graph, we say the vertices are
*connected*. A path that starts and ends at the same vertex ($v_k=v_0$)
is called a *cycle*.

A *tree* is a special kind of undirected graph in which there are no
cycles and in which all vertices are connected. A tree can be thought of
a directed graph in which each vertex $v$ is the head of at most one
edge in $E$. In fact, all vertices are the head of exactly one edge
except one special vertex, called the *root* of the tree
(Fig. 1.a).

In a directed tree, we define two concepts for each vertex $v$. The
*parent* of $v$, $Parent(v)$ is defined as the unique vertex $u$ for
which $(u,v)\in E$, or *nil* if $v$ is the root. The *children* of $v$
are defined as the heads of edges for which $v$ is a tail:
$$Children(v) = \{ w \in V \quad |\quad (v,w) \in E\}.$$ If a vertex has
no children, then it is called a *leaf*. Trees are usually drawn in
downward fashion starting from the root, with all children of a node on
the next level down.

With this definition, we can also define the *ancestors* of a vertex $v$
which is the set of nodes along the path from the root to $v$, inclusive
(Fig. 1.b). This set includes $v$, $v$'s parent, the
parent of $v$'s parent, and so on, until the root is reached. Similarly,
the *descendants* of $v$ are the set of nodes $w$ for which either $v=w$
or there exists a directed path from $v$ to $w$. In other words, $v$'s
descendants are $v$, $v$'s children, $v$'s children's children, and so
on. The tree consisting of $v$'s descendants and edges between them is
also known as the *subtree* at $v$. In the subtree, $v$ is the root
(Fig. 1.c).

Representations of trees are simpler than general graphs, so we will
describe them first. In most object-oriented programming languages, a
tree can be represented as a simple hierarchical data structure. Define
a structure `TreeNode`

representing a vertex and containing its entire
subtree. It has the following members:

`data`

: application-specific data to be stored with the vertex.`children`

: array or list of`TreeNode`

s.`parent`

: optional, a`TreeNode`

reference.

Then the entire tree is stored as the root `TreeNode`

. With this
structure it is possible to search downward from the root for any
desired node. If the parent member is present, then it is also possible
to search upward from any node.

Graph data structures are a bit more complex. The most straightforward
representation, which stores $V$ and $E$ as lists of vertices and pairs
of vertices, respectively, is not computationally convenient for most
purposes. For example, to determine a vertex's neighbors $N(v)$, one
would need to loop through *all* of the edges in $E$ to find if $v$ is
one of the edge's endpoints. This is an $O(|E|)$ operation, which is
$O(|V|^2)$ in the worst case! Since this is a fundamental operation to
most interesting operations on graphs (determining connectedness, graph
search, etc) the *adjacency list* representation is often preferred.

The adjacency list representation stores for each vertex $v$ the list of
its neighbors $N(v)$. Specifically, if we were to number the
$n\equiv |V|$ vertices as $0,1,\ldots,n-1$ (assuming 0-based indexing in
the programming language), the adjacency list representation stores a
nested list `edges`

$[0], \ldots,$ `edges`

$[n-1]$ where each `edges`

$[i]$
is a list of vertex indices in $\{ 0,\ldots,n-1 \}$.
The code below illustrates this for a simple graph shown in Fig. 2.

```
data = ['A','B','C','D'] #vertex names
edges = [[1,2],#indices of neighbors of 'A'
[3], #indices of neighbors of 'B'
[3], #indices of neighbors of 'C'
[]] #indices of neighbors of 'D'
```

Any application-specific data associated with vertices is stored as an
auxiliary array `data`

$[0], \ldots,$ `data`

$[n-1]$. The main
disadvantage of this representation is that identifying whether an edge
exists requires iterating through the adjacency list of the edge's tail.

The alternative *adjacency matrix* representation is useful for a number
of operations, such as determining whether an edge exists in the graph
and various graph-theoretic operations. This represents the edges of a
graph as an $n \times n$ array such that `edges`

$[i,j]$ is 1 if there is
an edge in $E$ from the vertex indexed by $i$ to the vertex indexed by
$j$, and 0 otherwise. Again we are assuming 0-based indexing.
the matrix
$$\begin{bmatrix}
0 & 1 & 1 & 0 \\
0 & 0 & 0 & 1 \\
0 & 0 & 0 & 1 \\
0 & 0 & 0 & 0
\end{bmatrix}$$
gives the adjacency list for the graph shown in Fig. 2. The main
disadvantages of this representation are that 1) it takes $n^2$ space to
store, and 2) determining neighbors of a vertex requires checking $n$
entries.

In [ ]:

```
```