9615-04

CIS 9615. Analysis of Algorithms

Median and Order Statistics

1. The selection problem

Selection problem: get the ith smallest element from n distinct numbers. The ith order statistic of a set: the ith smallest element.

Special case: median (in statistics), lower median (textbook default), upper median

Easy cases: minimum (or maximum): Algorithm-0 takes n-1 comparisons in an unsorted data structure (no more, no less?). Best case, worst case, and average case.

To find both the minimum and maximum.

Algorithm-1: run Algorithm-0 twice.
Algorithm-2: processing numbers in pairs, each costs 3 comparisons — 3n/2 - 2 in total if n is even (the first pair cost 1), and 3(n-1)/2 if n is odd (the first number is set as min and max). In summary, it is ceiling(3n/2) - 2.

How about the 2nd minimum (or maximum)? How about to form groups of 3 elements or more?

If i is a constant, extend the algorithms to find

the ith, or bottom-i;
the median, or the bottom-i%.

What is the difference, in asymptotic notations, between the above two situations?

Selection by sorting: O(n lg n). Can it be faster if we don't need total order? How about stopping a sorting algorithm somewhere in the middle?

2. Selection by partition with random pivot

The idea: a partition finds the relative rank of the pivot without sorting all numbers. When the pivot is the kth smallest, it either ends the process (if k = i), or reduces the instance size to max(k-1, n-k).

The algorithm Randomized-Select (page 186) uses Randomized-Partition (page 154), and has a worst case Θ(n²).

The average running time of the algorithm is given by E[T(n)] = Σ[k=1..n](1/n)E[T(max(k-1,n-k))] + O(n), which can be proved to be O(n). Therefore the average running time is the same as partition.

Or, think in this way: there is 98% change a randomly selected pivot will reduce the problem size by 1%, then T(n) = T(0.99n) + O(n) = O(n) by Master Theorem.

3. Selection by partition with selected pivot

How to get a O(n) worst case time? To reduce the instance size by a constant proportion after each partition.

The SELECT algorithm (page 189): divide the n elements into groups of c (a constant), find the median of each group (using Insertion-Sort), then select the median of the group medians recursively, and finally use the median-of-medians as pivot to do partition. Intuitively, such a pivot can at least reduce instance size by 1/4 — it is larger and smaller than half elements in half groups, respectively.

The textbook gives the proof that the selection algorithm has a linear worst time using c = 5.