Greedy algorithms
Greedy algorithm: always select the best step among the immediate choices, without looking ahead.
Advantage: simple and efficient. Disadvantage: may miss the best path.
Application: connecting a number of computers with the shortest cable.
Reducing the number of trees by one in each step. The cut property.
Prim's algorithm: repeatedly add the next lightest edge into the MST without produce a cycle, using a priority queue (heap), Compared with Dijkstra's algorithm
Kruskal's algorithm: repeatedly add the next lightest edge that doesn't produce a cycle.
Representing a disjoint-set by a tree, where each node points to its parent, identified by its root (which point to itself).
Kruskal's algorithm has complexity O(|E| log |V|).
Basic assumption: the data is sequences of basic symbols or signals in a finite set, which will be coded into binary codewords. The code table is used in encoding and decoding.
Example: the most efficient way to code messages consisting of {A,B,C,D}.
What if the symbols have very different probabilities to appear?
Idea: variable-length code. Average codeword length.
Condition for the code to be usable: the prefix-free property. Codeword table as full binary tree. Example. Encoding and decoding process. Why the prefix-free property is guaranteed.
Input: symbols with probability, frequency, or count in sample message. Output: encoding tree and/or code table.
Frequency of an intermediate node is the sum of those of its children. Total cost: the sum of the frequencies of all nodes except the root. Numbers of leaves and intermediate nodes. Greedy: start from the smaller numbers.
Huffman algorithm. Demo applet.
How to prove that the algorithm produces the shortest code?
Complexity: O(n log n) if the priority queue is a heap.
Problem: Given some subsets of a set, find the smallest number of them that contains every element of the set.
Solution: Repeatedly select the subset containing the largest number of uncovered elements.
Example. Greedy solution: a,f,c,j; optimal solution: b,e,i.
For n elements, if the optimal solution is k, the greedy solution is at most k ln n. Therefore, the approximation factor of this greedy algorithm is ln n. On the other hand, it has a much lower time complexity than the optimal algorithm on this problem.