CIS 9615. Analysis of Algorithms

Median and Order Statistics

 

1. The selection problem

Selection problem: get the ith smallest element from n distinct numbers. The ith order statistic of a set: the ith smallest element.

Special case: median (in statistics), lower median (textbook default), upper median

Easy cases: minimum (or maximum): Algorithm-0 takes n-1 comparisons in an unsorted data structure (no more, no less?). Best case, worst case, and average case.

To find both the minimum and maximum.

How about the 2nd minimum (or maximum)? How about to form groups of 3 elements or more?

If i is a constant, extend the algorithms to find

  1. the ith, or bottom-i;
  2. the median, or the bottom-i%.
What is the difference, in asymptotic notations, between the above two situations?

Selection by sorting: O(n lg n). Can it be faster if we don't need total order? How about stopping a sorting algorithm somewhere in the middle?

 

2. Selection by partition with random pivot

The idea: a partition finds the relative rank of the pivot without sorting all numbers. When the pivot is the kth smallest, it either ends the process (if k = i), or reduces the instance size to max(k-1, n-k).

The algorithm Randomized-Select (page 186) uses Randomized-Partition (page 154), and has a worst case Θ(n2).

The average running time of the algorithm is given by E[T(n)] = Σ[k=1..n](1/n)E[T(max(k-1,n-k))] + O(n), which can be proved to be O(n). Therefore the average running time is the same as partition.

Or, think in this way: there is 98% change a randomly selected pivot will reduce the problem size by 1%, then T(n) = T(0.99n) + O(n) = O(n) by Master Theorem.

 

3. Selection by partition with selected pivot

How to get a O(n) worst case time? To reduce the instance size by a constant proportion after each partition.

The SELECT algorithm (page 189): divide the n elements into groups of c (a constant), find the median of each group (using Insertion-Sort), then select the median of the group medians recursively, and finally use the median-of-medians as pivot to do partition. Intuitively, such a pivot can at least reduce instance size by 1/4 — it is larger and smaller than half elements in half groups, respectively.

The textbook gives the proof that the selection algorithm has a linear worst time using c = 5.