**CIS 9615. Analysis of Algorithms
****Median and Order Statistics**

### 1. The selection problem

Selection problem: get the *i*th smallest element from *n* distinct numbers. The *i*th order statistic of a set: the *i*th smallest element.
Special case: median (in statistics),
lower median (textbook default), upper median

Easy cases: minimum (or maximum): Algorithm-0 takes n-1 comparisons in an unsorted data structure (no more, no less?). Best case, worst case, and average case.

To find both the minimum and maximum.

- Algorithm-1: run Algorithm-0 twice.
- Algorithm-2: processing numbers in pairs, each costs 3 comparisons — 3n/2 - 2 in total if n is even (the first pair cost 1), and 3(n-1)/2 if n is odd (the first number is set as min and max). In summary, it is ceiling(3n/2) - 2.

How about the 2nd minimum (or maximum)? How about to form groups of 3 elements or more?
If *i* is a constant, extend the algorithms to find

- the
*i*th, or bottom-*i*;
- the median, or the bottom-i%.

What is the difference, in asymptotic notations, between the above two situations?
Selection by sorting: O(n lg n). Can it be faster if we don't need total order? How about stopping a sorting algorithm somewhere in the middle?

### 2. Selection by partition with random pivot

The idea: a partition finds the relative rank of the pivot without sorting all numbers. When the pivot is the *k*th smallest, it either ends the process (if *k = i*), or reduces the instance size to max(k-1, n-k).
The algorithm Randomized-Select (page 186) uses Randomized-Partition (page 154), and has a worst case Θ(n^{2}).

The average running time of the algorithm is given by E[T(n)] = Σ[k=1..n](1/n)E[T(max(k-1,n-k))] + O(n), which can be proved to be O(n). Therefore the average running time is the same as partition.

Or, think in this way: there is 98% change a randomly selected pivot will reduce the problem size by 1%, then T(n) = T(0.99n) + O(n) = O(n) by Master Theorem.

### 3. Selection by partition with selected pivot

How to get a O(n) worst case time? To reduce the instance size by a constant proportion after each partition.
The SELECT algorithm (page 189): divide the *n* elements into groups of *c* (a constant), find the median of each group (using Insertion-Sort), then select the median of the group medians recursively, and finally use the median-of-medians as pivot to do partition. Intuitively, such a pivot can at least reduce instance size by 1/4 — it is larger and smaller than half elements in half groups, respectively.

The textbook gives the proof that the selection algorithm has a linear worst time using *c = 5*.