5511-12

CIS 5511. Programming Techniques

Graphs (1)

1. Graphs and their representation

A graph consists of vertices (nodes) and edges (links) between them, and represents relations (of the same type) among entities.

Formally, a graph G = (V, E), where V is a set of vertices, and E is a set of edges each being a pair ‹s, d› (from vertex s to vertex d), so 0 ≤ |E| ≤ |V|². The "size" of G should be represented by both |V| and |E|, though sometime f(|V|, |E|) is simplified to f(V, E).

Under this definition, graph is more general than tree and linear data structures, though it can still be further extended into multigraph, hypergraph, knowledge graph, etc.

A graph may either be directed or undirected, depending on whether the relation is symmetric or not.

A path is a sequence of end-to-end edges, and its length can be zero or any positive integer. A cycle (loop) is a path with the same starting and ending vertex. In a directed graph, a cycle can contains a single edge, while in a undirected graph, a cycle is usually required to contain at least 3 edges.

An undirected graph is connected if there is a path from every vertex to every other vertex. A directed graph is strongly connected if there is a (directed) path from every vertex to every other vertex; it is weakly connected if the graph is connected when the direction of edge is ignored.

The two common ways to represent a graph in a data structure is by an adjacency list and by an adjacency matrix. The former uses Θ(|V| + |E|) space and is better for sparse graphs, while the latter uses Θ(|V|²) space and is better for dense graphs. Also, for a directed graph, an adjacency matrix representation allows the edges to be followed in both directions with the same cost, which is not the case for an adjacency list representation.

In the following algorithms, these two representations are unified by using G.adj[u] to represent the set {v | ‹u, v› is in G.E}.

A graph is weighted if every edge has a number attached to represent a measurement on the relation, such as distance, cost, etc., which can be represented by augmenting the adjacency list or matrix.

2. Search

In graphs, various search (traversal) algorithms systematically visit every vertex, normally by following the edges.

Breadth-First Search (BFS): starting from a source vertex, visits its neighbors layer-by-layer with increasing distance. In the following algorithm, a "color" is attached to every node to indicate its status: to be processed (white), being processed (gray), has been processed (black). For each node, its predecessor (π) and distance (d) are recorded (on the path from the source node). A queue Q is used to hold the nodes under processing.

Example:

Under the assumption that all nodes can be reached from the source node, the running time of BFS is Θ(V + E).

As a by-product, this algorithm also generates a search tree with s as root, and finds the shortest path between s and every reachable vertex in G. In line 13, if color[v] is not WHITE, then a cycle is detected (though in a undirected graph, v cannot be the predecessor of u, otherwise every edge forms a cycle).

If in search an algorithm tries to go as deep as possible in each step, then it is a Depth-First Search (DFS) algorithm.

If the queue in the above BFS algorithm is changed into a stack, the algorithm will do DFS; if the queue is changed into a priority queue, the algorithm will do Best-First Search. In each case, it can only reach the nodes connected to the starting one.

The following is a recursive DFS which does not specify a starting node, and may generate a search forest containing more than one tree when some nodes cannot be reached from the first node. The previous BFS algorithm can be revised to do so, too, though it won't be recursive.

This algorithm uses two time-stamps, d and f, to record the time of color changing for each vertex. This information will be used in the other algorithms to be introduced later.

Example:

Edges not in the tree are labeled B, F, and C for "back", "forward", and "cross", respectively, depending on whether the two vertices have been linked in the tree by a path.

The running time of DFS is also Θ(V + E).

3. Topological sort

A topological sort of a directed acyclic graph ("dag") G = (V, E) is a linear ordering of all its vertices such that if G contains an edge (u, v), then u appears before v in the ordering. If the graph is cyclic, no linear ordering is possible.

Topological sorting can be accomplished by repeatedly remove vertex v and out-going edges from the graph, under the condition that there is no in-coming edge to v, until the vertex set is empty or no longer contains such a vertex (when the graph is cyclic).

The following solution uses the previous DFS algorithm.

Its complexity is Θ(V + E).

Example:

4. Strongly connected components

DFS can also be used to decompose a directed graph into its strongly connected components. In the algorithm, the transpose of G, G^T, is obtained by reversing the direction of all the edges of G. G^T and G have the same strongly connected components, so DFS on both graphs in a proper order will identify them.

Example:

The correctness of this algorithm is justified using the properties of DFS and is explained in the textbook. The complexity of the algorithm is Θ(V + E), the same as DFS.