Graphs (1)
Formally, a graph G = (V, E), where V is a set of vertices, and E is a set of edges each being a pair ‹s, d› (from vertex s to vertex d), so 0 ≤ |E| ≤ |V|2. The "size" of G should be represented by both |V| and |E|, though sometime f(|V|, |E|) is simplified to f(V, E).
A graph may either be directed or undirected, depending on whether the relation is symmetric or not.
A path is a sequence of end-to-end edges, and its length can be zero or any positive integer. In a directed graph, a cycle is a path of non-zero length with the same starting and ending vertex.
An undirected graph is connected if there is a path from every vertex to every other vertex. A directed graph is strongly connected if there is a (directed) path from every vertex to every other vertex; it is weakly connected if the graph is connected after all directed edges are turned into undirected edges.
The two common ways to represent a graph in a data structure is by an adjacency matrix and by an adjacency list. The former uses Θ(|V|2) space and is better for dense graphs, while the latter uses Θ(|V| + |E|) space and is better for sparse graphs. Also, for a directed graph, an adjacency matrix representation allows the edges to be followed in both directions with the same cost, which is not the case for an adjacency list representation.
In the following algorithms, these two representations are unified by using G.adj[u] to represent the set {v | ‹u, v› is in G.E}.
A graph is weighted if every edge has a number associated to represent distance, cost, etc., which can be represented by augmenting the adjacency matrix or list.
Graph is more general than tree and linear data structures, though it can still be further extended into hypergraph, multigraph, etc.
Breadth-First Search (BFS): starting from a source vertex, visits its neighbors layer-by-layer with increasing distance. In the following algorithm, a "color" is attached to every node to indicate its status: to be processed (white), being processed (gray), has been processed (black). For each node, its predecessor (π) and distance (d) are recorded (on the path from the source node). A queue Q is used to hold the nodes under processing.
Example:
Under the assumption that all nodes can be reached from the source node, the running time of BFS is Θ(V + E).
As a by-product, this algorithm also generates a search tree with s as root, and finds the shortest path between s and every reachable vertex in G. In line 13, if color[v] is not WHITE, then a (undirected) cycle is detected.
If in search an algorithm tries to go as deep as possible in each step, then it is a Depth-First Search (DFS) algorithm.
If the queue in the above BFS algorithm is changed into a stack, the algorithm will do DFS; if the queue is changed into a priority queue, the algorithm will do Best-First Search. In each case, it can only reach the nodes connected to the starting one.
The following is a recursive DFS which does not specify a starting node, and may generate a search forest containing more than one tree when the graph is not strongly connected. The previous BFS algorithm can be revised to do so, too, though it won't be recursive.
This algorithm uses two time-stamps, d and f, to record the time of color changing for each vertex. This information will be used in the other algorithms to be introduced later.
Example:
Edges not in the tree are labeled B, F, and C for "back", "forward", and "cross", respectively, depending on whether the two vertices have been linked in the tree by a path.
The running time of DFS is also Θ(V + E).
The following algorithm uses DFS to do topological sorting:
Its complexity is Θ(V + E).
Example:
Another solution to the problem is to repeatedly remove vertex v and out-going edges from the graph, under the condition that there is no in-coming edge to v, until the vertex set is empty or no longer contains such a vertex (when the graph is cyclic).
Example:
By considering vertices in the second depth-first search (b) in decreasing order of the finishing times that were computed in the first depth-first search (a), the algorithm visits the vertices of the component graph (c) in topologically sorted order.
Since each major step is linear, the algorithm is Θ(V + E).