CIS 5511. Programming Techniques

Growth of Functions

 

A central topic in algorithm analysis is to compare the time (or space) efficiency of algorithms (for the same problem). As the running time of an algorithm is represented as a function of instance size, the problem is transformed into categorization of these functions.

1. Asymptotic notations

To simplify the analysis of the time expense of algorithms, the following assumptions are usually made:
  1. A size measurement can be established among the (infinite number of) instances of a problem to indicate their difficulty, so that the time spent by an algorithm on an instance is a nondecreasing function of its size;
  2. The time expense of an algorithm becomes an issue when the size is large, and the major factor indicating the time efficiency of an algorithm is how fast it grows with the size;
  3. To compare the time efficiency of algorithms, it is often enough to find the order of growth for each of them, so as to categorize them using representative functions.
For this reason, we study the asymptotic efficiency of algorithms, that is, how its running time increases with the size of input in the limit, as the size of the input increases without bound.

Each asymptotic notation corresponds to a set of functions, represented by a simple function g(n):

Examples:

Intuitively, the five notations correspond to relations =, ≤, ≥, <, and >, respectively, when applied to compare growth order of two functions (though not all functions are asymptotically comparable). All of the five are transitive, the first three are reflexive, and only the first one is symmetric.

Relations among the five:

One way to distinguish them: The common growth orders, from low to high, are: Polynomial functions can be further divided into linear (Θ(n)), quadratic (Θ(n2)), cubic (Θ(n3)), and so on.

Asymptotic notions can be used in equations and inequalities, as well as in certain calculations. For example,
2n2 + 3n + 1 = Θ(n2) + Θ(n) + Θ(1) = Θ(n2).

In algorithm analysis, the most common conclusions are worst case expenses expressed as O(f(n)), which can be obtained by focusing on the most expensive item in the function, as in the analysis of Insertion Sort algorithm.

 

2. Analysis of recurrences

When an algorithm contains a recursive call to itself, its running time can often be described by a recurrence, i.e., an equation or inequality that describes a function in terms of its value on smaller inputs.

In general, the "divide-and-conquer" approach uses D(n) time to divide a problem into a smaller subproblems of the same type, each with 1/b of the original size. After the subproblems are solved, their solutions are combined in C(n) time to get the solution for the original problem. This process is repeated on the subproblems, until the input size becomes so small that the problem can be solved in constant time. This approach gives us the following recurrence:

For example, for the Merge Sort algorithm, we have

This is the case, because each time an array of size n is divided into two halves, and to merge the results together takes linear time.

Often we need to solve the recurrence, so as to get a running time function without recursive call.

One way to solve a recurrence is to use the substitution method, which uses mathematical induction to prove a function that was guessed previously.

For example, if the function is

one reasonable guess is T(n) = O(n lg n). To show that this is indeed the case, we can prove that T(n) ≤ c n lg n for an appropriate choice of the constant c > 0. We start by assuming that this bound holds for halves of the array, then, substituting into the recurrence yields,

where the last step holds as long as c ≥ 1.

Furthermore, mathematical induction requires us to show that our solutions holds for the boundary conditions, that is, we can choose the constant c large enough so that T(n) ≤ c n lg n holds there. For this example, if the boundary condition is T(1) = 1, then c = 1 does not work there because c n lg(n) = 1 1 lg(1) = 0. To solve this issue we only need to show that T(n) ≤ c n lg n holds when n is above a certain value. For n = 2, we do have T(n) ≤ c 2 lg 2 as long as c ≥ 2.

In summary, we can use c ≥ 2, and we have shown that T(n) ≤ c n lg n for all n ≥ 2.

To get a good guess for recurrence function, one way is to draw a recursion tree. For example, if the function is

we can create the following recursion tree:

To determine the height of the tree i, we have n / 4i = 1, that is, i = log4n. So the tree has log4n + 1 levels.

At level i, there are 3i nodes, each with a cost c(n / 4i)2, so the total is (3/16)icn2.

At the leave level, the number of nodes is 3log4n, which is equal to nlog43 (see page 54). At that level, each node costs T(1), so the total is nlog43T(1), which is Θ(nlog43).

After some calculation (see the textbook for details), it is guessed that T(n) = O(n2).

Finally, there is a "master method" that provides a general solution for all divide-and-conquer recurrence. It depends on the following theorem:

Intuitively, the theorem says that the solution to the recurrence is determined by the larger one of f(n) and nlogba. The above example falls into Case 3.

The proof of the master theorem is given in the textbook. It is important to realize that the three cases of the master theorem do not cover all the possibilities.

As a special case, when a = b we have logba = 1. The above results are simplified to (1) Θ(n), (2) Θ(nlgn), and (3) Θ(f(n)), respectively. We can see that Merge Sort belongs to Case 2.