9615-12

CIS 9615. Analysis of Algorithms

Tractability and Approximation

1. Complexity classes

The time efficiency of algorithms: running time vs. time growth, compared to other algorithms vs. compared to a growth order.

In terms of growth order, polynomial-time algorithms are usually considered as tractable (efficient, affordable).

Complexity of algorithm vs. complexity of problem.

General representation: An abstract problem Q is a binary relation on a set I of problem instances and a set S of problem solutions. It becomes a concrete problem when every instance is represented by a binary encoding, with the length of the string taken as the size of instance.

A decision problem: a solution is "yes" or "no". Many other types of problems can be casted into a related decision problem that is no harder. For example, an optimization problem "Find the shortest path" can be casted into decision problem "Does the shortest path have length k?".

A decision problem is often represented by a formal language whose sentences corresponds to problem instances whose answer is "yes". In this way, "to solve a decision problem" becomes "to decide whether a language accepts a sentence". A complexity class is a class of languages of a certain complexity.

The complexity class P is the class of languages that can be decided by a polynomial-time algorithm, i.e., there is a constant k such that for any length-n binary string x, algorithm A correctly accepts or rejects x in time O(n^k).

The complexity class NP is the class of languages that can be verified by a polynomial-time algorithm. A two-argument algorithm A verifies an input string x if there exists a certificate y (which is also a binary string) such as A(x, y) = 1. Intuitively, the algorithm A uses the certificate y to prove that x is in L.

Obviously, P ⊆ NP, but whether P = NP is a pending problem.

A problem P can be reduced to another problem Q if any instance of P can be mapped into an instance of Q, and the solution to Q provides a solution to the instance of P. A language L₁ can be reduced into a language L₂ in polynomial time, we have L₁ ≤_p L₂. In this situation, if L₂ is in P, so is L₁.

A binary language L is "NP-complete" (or "in NPC") if it is in NP, and every NP problem can be reduced to it. Defined in this way, if any language in NPC is also in P, then P = NP.

Since there are already many known NPC problems, we can prove a new NP problem belongs to this category by reducing (in polynomial time) a known NPC problem into it.

So far, all known algorithms of NPC problems take exponential time, so these problems are considered as "intractable" or "hard". Most people believe that P and NPC are two subsets of NP that have no common element. However, it hasn't been proved yet.

Example of NPC problems: Traveling-Salesman Problem (TSP). Given a weighted undirected graph, G, a tour is a path that goes through each vertex exactly once and finally return to the starting point. The original TSP is to find the shortest tour in a complete graph (where there is an edge between every pair of vertices), while the related decision problem is to find a tour within a given total weight k. This problem is in NP because given a graph and a tour, it is easy to check whether the tour satisfies the requirement; it is in NPC because a known NPC problem, Hamiltonian Cycle, can be reduced into it. This problem can be solved by exhaustively generating all possible tours, but that will use exponential time.

2. Approximation algorithms

For hard problems, there are three common approaches:

to use the available (exponential) algorithm if the actual input size is small;
to redefine the problem by reducing the range of instances;
to redefine the problem by expanding the range of solutions.

Approximation algorithms belong to the third approach.

For an optimization problem, an approximation ratio ρ(n) is often used to indicate the closeness of an approximate solution to an optimal solution. For any input of the size n, if actual solution is C and the optimal solution is C*, and max(C/C*, C*/C) ≤ ρ(n), the algorithm is called a ρ(n)-approximation algorithm.

Usually there is a trade-off between computation time and the quality of the approximation.

For example, if in a Traveling-Salesman Problem (TSP) we can assume the triangle inequality, that is, a direct path is never longer than an indirect path, then an approximate solution can be obtained by first building a minimum spanning tree for the graph, then planning a tour based on the tree.
12-01 (23K)
In the last step, a "Hamiltonian cycle" is a cycle in an undirected graph which visits each vertex exactly once.

In the following example, the actual distance between vertices is used as the weight of the edge connecting them. The first four figures show how the algorithm works step by step, while the last one shows an optimal solution.
12-02 (55K)

The running time of Approx-TSP-Tour is polynomial.

To evaluate the approximation ratio of Approx-TSP-Tour, we consider the following items:

T, the minimum spanning tree
W, a "full walk" of T that list a node whenever it is reached (or returned to) during the tree walk, so for the given example, it is a,b,c,b,h,b,a,d,e,f,e,g,e,d,a — see figure (c)
H, the tour found by the algorithm
H*, the (unknown) optimal tour

There are the following relations among the total distances of the four:

c(H) ≤ c(W) [triangle inequality]
c(W) = 2c(T) [W walks every edge in T twice]
c(T) ≤ c(H*) [H* is a tour, so removing an edge from it will get a spanning tree]

Therefore, c(H) ≤ 2c(H*), or c(H)/c(H*) ≤ 2. Since c(H*)/c(H) ≤ 1, Approx-TSP-Tour is a 2-approximation algorithm.