Week 06

Graphs and Graph Algorithms

Graphs 2/64

Many applications require

a collection of items (i.e. a set)
relationships/connections between items

Examples:

maps: items are cities, connections are roads
web: items are pages, connections are hyperlinks
facebook: items are people, connections are "friends"

Collection types we've seen so far

sets ... unordered collection of items
lists ... linear sequence of items

Graphs are more general ... allow arbitrary connections.

... Graphs 3/64

A graph G = (V,E)

V is a set of vertices
E is a set of edges (subset of V×V)

Example:

... Graphs 4/64

A real example: Australian road distances

Distance	Adelaide	Brisbane	Canberra	Darwin	Melbourne	Perth	Sydney
Adelaide	-	2055	1390	3051	732	2716	1605
Brisbane	2055	-	1291	3429	1671	4771	982
Canberra	1390	1291	-	4441	658	4106	309
Darwin	3051	3429	4441	-	3783	4049	4411
Melbourne	732	1671	658	3783	-	3448	873
Perth	2716	4771	4106	4049	3448	-	3972
Sydney	1605	982	309	4411	873	3972	-

Notes: vertices are cities, edges are distance between cities, symmetric

... Graphs 5/64

Alternative representation of above:

[Diagram:Pics/graphs/ausroads-small.png]

... Graphs 6/64

Questions we might ask about a graph:

is there a way to get from item A to item B?
what is the best way to get from A to B?
which vertices are connected?

Graph algorithms are generally more complex than tree/list ones:

no implicit order of items
graphs may contain cycles
concrete representation is less obvious
algorithm complexity depends on connection complexity

Properties of Graphs 7/64

Terminology: |V | and |E | normally written just as V and E.

A graph with V vertices has at most V(V-1)/2 edges.

The ratio V:E can vary considerably.

if E is closer to V²/2, the graph is dense
if E is closer to V, the graph is sparse

Knowing whether a graph is sparse or dense is important

may affect choice of data structures to represent graph
may affect choice of algorithms to process graph

Describing Graphs 8/64

Defining graphs:

need some way of identifying vertices (e.g. number 1..V)
could give diagram showing edges and vertices
could give a list of edges

E.g. four representations of the same graph:

[Diagram:Pics/graphs/graph-reps-small.png]

Graph Terminology 9/64

For an edge e, that connects vertices v and w

v and w are adjacent (neighbours)
e is incident on both v and w

Degree of a vertex v

number of edges incident on v

Synonyms:

vertex = node, edge = arc = link

A node may contain other info apart from key (identifier).

An edge may have additional info (e.g. weight, direction)

... Graph Terminology 10/64

Path: a sequence of vertices where

each vertex is has an edge to its predecessor

Cycle: a path where

last vertex in path is same as first vertex in path

[Diagram:Pics/graphs/pc-graphs-small.png]

... Graph Terminology 11/64

Connected graph

there is a path from each vertex to every other vertex
if a graph is not connected, it has connected components

Complete graph

there is an edge from each vertex to every other vertex
in a complete graph, E = V(V-1)/2

[Diagram:Pics/graphs/complete-small.png]

... Graph Terminology 12/64

Tree: connected (sub)graph with no cycles

Spanning tree: tree containing all vertices

Clique: complete subgraph

Consider the following single graph:

This graph has 26 vertices, 32 edges, and 4 connected components

... Graph Terminology 13/64

Other types of graphs ...

Directed graph (di-graph)

each edge has an associated direction

Weighted graph

each edge has an associated value (weight)

Multi-graph

allow multiple edges between two vertices
e.g. function call graph (f() calls g() in several places)

Graph ADT 14/64

Data:

set of edges, set of vertices

Operations:

building: create graph, create edge, add edge
deleting: remove edge, drop whole graph
scanning: get edges, copy, show

Notes:

set of vertices is fixed when graph initialised
we treat vertices as ints, but could be Items

... Graph ADT 15/64

Graph ADT interface:

// graph representation is hidden
typedef struct GraphRep *Graph;
// vertices denoted by integers 0..N-1
typedef int Vertex;
int   validV(Graph,Vertex); //validity check
// edges are pairs of vertices (end-points)
typedef struct { Vertex v; Vertex w; } Edge;
Edge mkEdge(Graph, Vertex, Vertex);
// operations on graphs
Graph newGraph(int nV);  // #vertices
void  insertE(Graph, Edge);
void  removeE(Graph, Edge);
Graph copy(Graph);
void  destroy(Graph);
void  show(Graph);

Introduce other functions as we go ...

... Graph ADT 16/64

Graph representation:

// GraphRep has to include V and E
typedef struct GraphRep {
   int  nV;  // # vertices
   int  nE;  // # edges
   ...
   // Representation for vertices
   // e.g. array [0..nV-1] of Items
   ...
   // Representation for edges
   // some possibilities to be discussed shortly
} GraphRep;

... Graph ADT 17/64

Implementation of foundation operations:

// make an edge for a given Grpah
Edge mkEdge(Graph g, Vertex v, Vertex w)
{
   assert(validV(g,v) && validV(g,w));
   Edge e = {v,w}; // struct assignment
   // or  Edge e; e.v = v; e.w = w;
   return e;
}

// is a vertex valid in a given Graph?
static int validV(Graph g, Vertex v)
{
   return (g != NULL && v >= 0 && v < g->nV);
}

Exercise 1: Canonical Edge Representation 18/64

Above mkEdge() simply puts vertices in edge in param order

Implement an edge equality check under such a scheme:

bool eqE(Edge a, Edge b) { ... }

Returns True if a == (v,w) && b == (v,w) OR a == (v,w) && b == (w,v)

Also ...

How would mkEdge() and eqE() change if

all edges (v,w) were created to ensure v < w

Array-of-edges Representation 19/64

Implementation of GraphRep

typedef struct GraphRep {
   int   nV;    // #vertices
   int   nE;    // #edges
   int   n;     // size of edge array
   Edge *edges; // array of edges
} GraphRep;

[Diagram:Pics/graphs/graph-array-edges-rep-small.png]

... Array-of-edges Representation 20/64

Implementation of graph initialisation:

Graph newGraph(int nV)
{
   assert(nV >= 0);
   int n = BigEnough;
   Edge *e = malloc(n*sizeof(Edge));
   assert(e != NULL);
   Graph g = malloc(sizeof(GraphRep));
   assert(g != NULL);
   g->nV = nV;  g->nE = 0;
   g->n = n;  g->edges = e;
   return g;
}

How large should n be? ... No more than (V²-V)/2

... Array-of-edges Representation 21/64

Implementation of edge insertion/removal:

void insertE(Graph g, Edge e)
{
   assert(g != NULL && g->nE < g->n);
   int i, nE = g->nE;
   for (i = 0; i < nE; i++)
      if (eqE(e,g->edges[i])) break;
   if (i == nE)
      g->edges[g->nE++] = e;
}
void removeE(Graph g, Edge e)
{
   assert(g != NULL);
   int i, nE = g->nE;
   for (i = 0; i < nE; i++)
      if (eqE(e,g->edges[i])) break;
   if (i < nE) {
      g->edges[i] = g->edges[nE-1];
      g->nE--;
   }
}

Exercise 2: Two points to consider 22/64

Implement an edge comparison function

cmpE(a,b) < 0 if a < b
cmpE(a,b) > 0 if a > b
cmpE(a,b) == 0 if a == b
int cmpE(Edge,Edge);

which compares first on v, then on w

The fixed array size is a limitation.
Suggest two possible ways to deal with this issue.

Exercise 3: Sorted Array of Edges 23/64

To improve performance, we wish to maintain edges in sorted order.
Rewrite the two functions below to achieve this:
```
void insertE(Graph g, Edge e) { ... }
void removeE(Graph g, Edge e) { ... }
```
Assume the existence of a binary search function:
```
// returns potential location for e in edges
int bSearch(Edge edges[], int lo, int hi, Edge e)
```
Assume that edges are defined as (x,y) where x<y

Exercise 4: Checking Neighbours (i) 24/64

Assuming an array-of-edges representation ...
Write a function to check whether two vertices are directly connected by an edge
Use the function interface:
```
int connected(Graph g, Vertex x, Vertex y) { ... }
```
Assume that
- we use canonical edge representation (v < w)
- we maintain the array of edges in sorted order
Don't assume that x < y.

Exercise 5: Getting Neighbours (i) 25/64

Assuming an array-of-edges representation ...
Write a function to give a list of all vertices connected to a given vertex
```
Vertex *neighbours(Graph g, Vertex v, int *nv);
```
which
- returns a dynamically allocated array of vertices (vs)
- sets the value of the nv parameter to size of array
Usage:
```
Graph g;  Vertex v;  int n;
...
Vertex *ns = neighbours(g, v, &n);
```
... Array-of-edges Representation 26/64

Cost of operations:
- initialisation: O(1) (malloc GraphRep and edges array)
- insert edge: O(E) (assuming edge array has space)
- delete edge: O(E) (need to find edge in edge array)
If array is full on insert
- malloc a bigger array, copy edges across ⇒ still O(E)
If we maintain edges in order
- use binary search to find edge ⇒ O(logE)
Adjacency Matrix Representation 27/64

Edges represented by a V × V matrix

... Adjacency Matrix Representation 28/64

Implementation of GraphRep
```
typedef struct GraphRep {
   int   nV;    // #vertices
   int   nE;    // #edges
   int **edges; // matrix of booleans
} GraphRep;
```
... Adjacency Matrix Representation 29/64

Implementation of graph initialisation:
```
Graph newGraph(int nV)
{
   assert(nV >= 0);
   int i, j;
   int **e = malloc(nV * sizeof(int *));
   assert(e != NULL);
   for (i = 0; i < nV; i++) {
      e[i] = malloc(nV * sizeof(int));
      assert(e[i] != NULL);
      for (j = 0; j < nV; j++)
         e[i][j] = 0;
   }
   Graph g = malloc(sizeof(GraphRep));
   assert(g != NULL);
   g->nV = nV;  g->nE = 0;  g->edges = e;
   return g;
}
```
Validity Checking 30/64

For asserts on various kinds of objects:
```
int validG(Graph g) {
	return (g != NULL);   // other checks?
}
int validV(Graph g, Vertex v) {
	return (validG(g) && 0 <= v && v < g->nV);
}
int validE(Graph g, Edge e) {
	return (validV(g,e.v) && validV(g,e.w));
}
```
Could also be done as C pre-processor macros, e.g.
```
#define validG(g)  ((g) != NULL)
```
Adjacency Matrix Representation 31/64

Implementation of GraphRep
```
typedef struct GraphRep {
   int   nV;    // #vertices
   int   nE;    // #edges
   int **edges; // matrix of booleans
} GraphRep;
```
... Adjacency Matrix Representation 32/64

Implementation of edge insertion/removal:
```
void insertE(Graph g, Edge e)
{
   assert(validG(g) && validE(g,e));
   if (g->edges[e.v][e.w]) return;
   g->edges[e.v][e.w] = 1;
   g->edges[e.w][e.v] = 1;
   g->nE++;
}
void removeE(Graph g, Edge e)
{
   assert(validG(g) && validE(g,e));
   if (!g->edges[e.v][e.w]) return;
   g->edges[e.v][e.w] = 0;
   g->edges[e.w][e.v] = 0;
   g->nE--;
}
```
Exercise 6: Checking Neighbours (ii) 33/64

Assuming an adjacency matrix representation ...
Write a function to check whether two vertices are directly connected by an edge
Use the function interface:
```
int connected(Graph g, Vertex x, Vertex y);
```
Exercise 7: Getting Neighbours (ii) 34/64

Assuming an adjacency matrix representation ...
Write a function to give a list of all vertices connected to a given vertex
```
Vertex *neighbours(Graph g, Vertex v, int *nv);
```
which
- returns a dynamically allocated array of vertices (vs)
- sets the value of the nv parameter to size of array
Usage:
```
Graph g;  Vertex v;  int n;
...
Vertex *ns = neighbours(g, v, &n);
```
... Adjacency Matrix Representation 35/64

Storage cost: V int ptrs + V² ints
Storage optimisation: store only top-right part of matrix.
Cost of operations:
- initialisation: O(V²) (initialise V×V matrix)
- insert edge: O(1) (set two cells in matrix)
- delete edge: O(1) (unset two cells in matrix)
Adjacency List Representation 36/64

For each vertex, store linked list of adjacent vertices:

... Adjacency List Representation 37/64

Implementation of GraphRep
```
typedef struct vNode *VList;
struct vNode { Vertex v; vList next; };
typedef struct graphRep GraphRep;
struct graphRep {
   int   nV;     // #vertices
   int   nE;     // #edges
   VList *edges; // array of lists
};
```
... Adjacency List Representation 38/64

Implementation of core operations:
```
Graph newGraph(int nV)
{
   int i, j;
   VList *e = malloc(nV * sizeof(VList));
   assert(e != NULL);
   for (i = 0; i < nV; i++) e[i] = newVList();
   Graph g = malloc(sizeof(GraphRep));
   assert(g != NULL);
   g->nV = nV;  g->nE = 0;  g->edges = e;
   return g;
}
```
... Adjacency List Representation 39/64

Implementation of edge insertion/removal (assuming VList ADT):
```
void insertE(Graph g, Edge e)
{
   assert(validG(g) && validE(g,e));
   int orig = length(g->edges[e.v]);
   g->edges[e.v] = insert(g->edges[e.v], e.w);
   g->edges[e.w] = insert(g->edges[e.w], e.v);
   if (length(g->edges[e.v]) > orig) g->nE++;
}
void removeE(Graph g, Edge e)
{
   assert(validG(g) && validE(g,e));
   int orig = length(g->edges[e.v]);
   g->edges[e.v] = delete(g->edges[e.v], e.w);
   g->edges[e.w] = delete(g->edges[e.w], e.v);
   if (length(g->edges[e.v]) < orig) g->nE--;
}
```
Exercise 8: Checking Neighbours (iii) 40/64

Assuming an adjacency list representation ...
Write a function to check whether two vertices are directly connected by an edge
Use the function interface:
```
int connected(Graph g, Vertex x, Vertex y);
```
Do we care about edge representation (e.g. v < w) ?

Exercise 9: Getting Neighbours (iii) 41/64

Assuming an adjacency list representation ...
Write a function to give a list of all vertices connected to a given vertex
```
Vertex *neighbours(Graph g, Vertex v, int *nv);
```
which
- returns a dynamically allocated array of vertices (vs)
- sets the value of the nv parameter to size of array
Usage:
```
Graph g;  Vertex v;  int n;
...
Vertex *ns = neighbours(g, v, &n);
```
Exercise 10: Adjacency List Storage 42/64

In the above representation, each edge (x,y) ...
- has an entry for y in x's list
- has an entry for x in y's list
which means ...
- we don't need to worry about canonical edge representation
- but we have a significant storage overhead 2×E
How could we reduce the amount of storage needed for lists?
How would the insertE() and removeE() functions change?

... Adjacency List Representation 43/64

Storage cost: V list ptrs + 2*E nodes ≅ O(V+E)
Cost of operations (if VLists unsorted):
- initialisation: O(V) (initialise VLists)
- insert edge: O(V) (insert vertex into list, if not there)
- delete edge: O(V) (need to find vertex in list)
- isConnected: O(V) (need to find vertex in list)
Comparison of Graph Representations 44/64

array
of edges adjacency
matrix adjacency
list

space usage E V² V+E

initialise 1 V² V

copy E V² V+E

destroy 1 V V+E

insert edge E 1 V

remove edge E 1 V

connected E 1 V

neighbours E V V

Graph Algorithms

Problems on Graphs 46/64

Above ADT represents and manipulates graphs at low level.
What kind of problems do we want to solve on/via graphs?
An Aside: Complexity Classes 47/64

Many of the above problems are solved, but some are not.
For the solved problems ...
- some can be solved efficiently via a simple algorithm
- some have polynomial worst-case performance (e.g. O(V²))
- some have exponential worst-case performance (e.g. O(2^V))
Classes of problems:
- P = an algorithm can compute answer in polynomial time
- NP = no P algorithm is known for solving the problem
... An Aside: Complexity Classes 48/64

The P and NP classes suggest "level of difficulty":
- P ... "easy" ... has known polynomial-time algorithm
- NP ... "hard" ... no P algorithm known
Another characterisation of "difficulty":
- easy ... have a polynomial-time algorithm (useful in practice)
- tractable ... have an algorithm, feasible only for small N
- intractable ... no tractable algorithm is known (NP-hard)
- non-computable ... no algorithm can exist
Graph Algorithms 49/64

In this part of the course, we examine algorithms for
- connectivity (simple graphs)
- path finding (simple/directed graphs)
- minimum spanning trees (weighted graphs)
- shortest path (weighted graphs)
and look at generic methods for traversing graphs.
We begin with one of the simplest graph traversals ...

Finding a Path 50/64

Problem: is there a path from vertex v to vertex w ?
As a function: bool isPath(Vertex v, Vertex w)
Approach to solving problem:
- examine vertices adjacent to v
- if any of them is w, then done
- otherwise try vertices two edges from v
- repeat, looking further and further from v
Two different approaches to order of searching:
- breadth-first search (BFS), depth-first search (DFS)
... Finding a Path 51/64

Comparison of BFS/DFS search for isPath(a,h) ...

Both approaches ignore some edges by remembering previously visited vertices.

... Finding a Path 52/64

Overall strategy for path-finding (function isPath(V,W)):
```
// is there a path from V to W?
ToVisit = {V}   // vertices to be checked
Visited = {}    // previously checked vertices
while (still some vertices in ToVisit) {
    X = remove a vertex from ToVisit
    Visited += X
    foreach (Y in neighbours(X)) {
        if (Y == W) return TRUE
        if (Y not in Visited) ToVisit += Y
    }
}
return FALSE
```
- BFS = ToVisit implemented via a queue of vertices
- DFS = ToVisit implemented via a stack of vertices
Exercise 11: BFS isPath() Function 53/64

As presented above, isPath() produces correct results.
However, it has a potential inefficiency.
What is the problem, and how could we fix it?
Hint: consider the following graph and isPath(a,e) ...

... Finding a Path 54/64

BFS algorithm:
```
int isPath(Graph g, Vertex v, Vertex w)
{
   int *visited = calloc(g->nV,sizeof(int));
   Queue q = newQueue();
   QueueJoin(q, v);
   while (!QueueEmpty(q)) {
      Vertex y, x = QueueLeave(q);
      foreach (y in neighbours(x)) {
         if (y == w) return TRUE;
         if (!visited[y]) {
            QueueJoin(q, y);
            visited[y] = 1;
         }
      }
   }
   return FALSE;
}
```
Exercise 12: Breadth-first Traversal 55/64

Show the BFS order we visit to determine isPath(a,k)

Assume neighbours are chosen in alphabetical order

... Finding a Path 56/64

DFS algorithm:
```
int isPath(Graph g, Vertex v, Vertex w)
{
   int *visited = calloc(g->nV,sizeof(int));
   Stack s = newStack();
   StackPush(s, v);
   while (!StackEmpty(s)) {
      Vertex y, x = StackPop(s);
      foreach (y in neighbours(x)) {
         if (y == w) return TRUE;
         if (!visited[y]) {
            StackPush(s, y);
            visited[y] = 1;
         }
      }
   }
   return FALSE;
}
```
Exercise 13: Depth-first Traversal 57/64

Show the DFS order we visit to determine isPath(a,k)

Assume neighbours are chosen in alphabetical order

Exercise 14: Finding Neighbours 58/64

How would we implement the pseudo-code:
```
foreach (y in neighbours(x))
```
for a graph represented as
- adjacency matrix
- adjacency lists
Exercise 15: Implement GraphLab 59/64

Implement a system with a user interface to
- create a random graph with V vertices, or
- create a graph by reading a definition from a file
- insert/remove edges one-by-one
- show the current state of a graph
Can be used as a basis for demo'ing later algorithms.

Graph Traversal

Graph Traversal 61/64

A common class of graph algorithms involves
- walking along edges and visiting vertices
- recording e.g. path taken, vertices visited, etc.
isPath() is a simple instance of a graph traversal
A more useful traversal would be findPath()
- show me a path from v to w, if one exists
Two approaches to graph traversal: depth-first, breadth-first.

Depth-first Traversal 62/64

Basic approach to depth-first traversal:
- visit and mark current vertex
- for each neighbour, traverse it recursively
  (choice in order to consider neighbours; we choose smallest first)
Notes:
- we number ("mark") vertices as we visit them
  (so that we can later trace path through graph)
- order ... counter; how many vertices traversed so far
- visited[v] ... order in which vertex v was visited
In this context, traversal is also called search (hence DFS)

... Depth-first Traversal 63/64

An aside on nondeterminism ...
"for each neighbour" doesn't specify a visiting order
As long as we visit all neighbours, the traversal algorithms work
Could choose a random order each time
- might lead to a different solution
- but always leads to a solution, if one exists
Such algorithms are called nondeterministic
Cf. "regular" (deterministic) algorithms ...
- always behave the same for a given start state
... Depth-first Traversal 64/64

Depth-first search (DFS) algorithm:
```
int order;     // visiting order
int *visited;  // array of visiting orders
               // indexed by vertex 0..V-1
void dfs(Graph g)
{
   int i;
   visited = malloc(g->nV*sizeof(int));
   for (i = 0; i < g->nV; i++) visited[i] = -1;
   order = 0;
   dfsR(g, 0);
}
void dfsR(Graph g, Vertex v)
{
   visited[v] = order++;
   Vertex w;
   for (w = 0; w < g->nV; w++) {
      if (!g->edges[v][w]) continue;
      if (visited[w] == -1) dfsR(g, w);
   }
}
```
Produced: 21 Aug 2017

	array of edges	adjacency matrix	adjacency list
space usage	E	V²	V+E
initialise	1	V²	V
copy	E	V²	V+E
destroy	1	V	V+E
insert edge	E	1	V
remove edge	E	1	V
connected	E	1	V
neighbours	E	V	V