Notes-ch8-07

Chapter 8

Non-Comparison Sorting

As we can observe that all comparison sorting algorithms discussed in Chapter 3 have a worst case running time (n lg n) or larger. Can we do better? The answer is no if the sorting is based on comparisons. We will prove this conclusion in this chapter. Moreover, we show that we may get faster running time if we use non-comparison sorting techniques.

4.1Lower Bounds for Comparison Sorting

Any comparison sorting algorithm can be represented by a binary decision tree (决策树). A decision tree is used to model a process that makes a decision based on a sequence of tests. Moreover, what kind of test should take place at each step is determined from the outcome of previous test.

The binary search tree is a good example of decision tree. For example, given a sequence 3, 6, 8, 10, 12, 13, 20, the binary search tree is shown below.

Example of a sorting algorithm.

The following binary tree represents an execution of an algorithm that sorts three numbers in A[1], A[2], and A[3]. Note that any correct algorithm should clearly define how to construct the decision tree for any input size n.

Fig. 8-1

In this decision tree, each leaf corresponds to a decision that tells us how to re-arrange the three elements such that they are in increasing order. This arrangement corresponds to a permutation among the three numbers.

In general, a decision for sorting n numbers must have at least n! leaves, each of which represents a permutation among the n numbers. Conducting the permutation will produce a sorted sequence. How to conducting the permutation is not shown by the decision tree, but is clearly given by the algorithm.

Then, why do we need the decision tree?

The decision tree is usually used to evaluate the complexity. For example, the longest path from the root to a leaf in the decision tree corresponds to the worst case. This is because the length of the path equals the number of tests performed to reach the decision (leaf). The shortest path from the root to a leaf corresponds to the best case. The average path length represents the average complexity of the algorithm.

Lemma 1 In any rooted binary tree with height h and L leaves, we have the relation L≤ 2h (or h≥ lg L).

Proof. Fig. 8-2 illustrates the case for a complete binary tree. The lemma is correct for the complete tree. Obviously, if the tree is not complete, then the number of leaves will be less than 2h. Therefore, in any case, we have L≤ 2h (or h≥ lg L).

? ? ? ? ? ?

? ? ?

level Number of nodes

1 2

i 1 2 22 2i 2h

Fig. 8-2

Theorem 1 Any comparison sorting algorithm for n numbers requires at least Ω(n lg n ) comparisons in the worst case.

Proof. Because the decision tree corresponding to a comparison sorting algorithm for n numbers must contain at least n ! leaves, one for each possible permutation of the input n numbers, by Lemma 1, we have 2h ≥ n !

That is, h ≥ lg(n !), or h ≥ ?lg(n !)?,

which means that the longest path has length ?lg(n !)? or larger.

Because n ! ? n π2??

? ??

e n n

we have

h ≥ lg(n !) ? 0.5lg(2πn ) + n lg ??

??e n

= 0.5lg(2π) +0.5lg n + n lg n - n lg e

?n lg n.

Therefore, h = Ω (n lg n

Theorem 1 is the famous theorem on (comparison) sorting lower bound.

Now, we study the average case.

As we analyzed earlier, given a comparison sorting algorithm, its average complexity can be measured by the average path length in the corresponding decision tree T. That is, the average length of a path from the root to a leaf. In order to compute the average length, we first compute the sum of all possible paths. Because a leaf is also called an external node, this sum is usually called the external path length (EPL).

Let L be the set of leaves.

|path from root to x|.

EPL(T) = ∑

∈L

After EPL is obtained, then the average complexity = EPL/|L|. We will show that EPL/|L| = Ω(n lg n).

Definition 1 A binary tree with L leaves is called the minimum EPL tree if its EPL value is the smallest among all binary trees with L leaves.

Obviously, the minimum EPL tree must be a full binary tree because we can reduce the EPL by shrink the edge (u , v ) if u has only one child of node v as illustrated by Fig. 8-3.

Fig. 8-3

Moreover, we have the following lemma.

Lemma 2 In a minimum EPL tree, all leaves must be on the bottom two levels.

Proof. Suppose a full binary tree T has height k , and a leaf occurs at level d , where d < k – 1 as illustrated in Fig. 8-4 (a).

u v

Fig. 8-4

We will prove that this tree cannot be a minimum EPL tree. The reason is as follows. If we cut the two leaves at level k and hook them to the node x , we will transform the binary tree to a new binary tree T ’ with the same number of leaves as shown by Fig. 8-4 (b). Now the EPL of tree T ’ is smaller than the EPL of T.

EPL(T ’) = EPL(T)+length(y )–length(x )+length change of {a, b} = EPL(T) + (k -1) – d + 2(d +1) – 2k = EPL(T) + (d +1) – k

Corollary 1 The EPL of the minimum EPL tree with L leaves is larger than L (lg L -1).

Theorem 2 Any comparison sorting algorithm for n numbers requires at least (n lg n ) comparisons in the average case.

level d

(a) Tree T

level d (b) Tree T ’

level d +1

Proof. Let T be the corresponding decision tree for the comparison sorting algorithm. As we discussed, the average number of comparisons can be measured by

A(n) = EPL(T)/L,

where L is the number of leaves in the tree T.

By Corollary 1, EPL(T) > L(lg L -1).

We have A(n) = EPL(T)/L > (lg L -1).

Because L≥n!, we have A(n) > n lg n! -1 = Ω(n lg n

From the above discussion, we know that, in order to break the Ω(n lg n) bound, we must design non-comparison sorting algorithms. In the following, we will discuss the Counting sort, Radix sort, and Bucket sort.

4.2Counting Sort

The counting sort does not rely upon comparisons between numbers, but it requires that:

(1)The n input numbers, a1, a2, …, a n, must be integers.

(2)The n input numbers must be in a limited range,

0 ≤a1, a2, …, a n≤k, and k = O(n).

Let the input numbers be stored in array A[1..n] which satisfy the above conditions. The following counting sort will produce the sorted sequence in array B[1..n].

Counting-Sort(A[1..n], B[1..n], k)

1for i ← 0 to k

2do C[i] ← 0

3for j ← 1 to n

4do C[A[j]] ← C[A[j]]+1

5//C[i] = number of elements equal to i

6for i ← 1 to k

7do C[i] ← C[i]+C[i-1]

8//C[i] = number of elements less than or equal to i

9for j←n downto 1

10do { i← A[j]

11B[C[i]] ←i

12C[i] ← C[i] - 1

13}

14End

A careful reader may notice that the for loop at line 9 takes the direction from n to 1. Can we do it from 1 to n? We leave this question to the reader.

Example.

Input: A[1..8], k = 5

1 2 3 4 5 6 7 8

After the line 4 of the algorithm, the array C becomes:

0 1 2 3 4 5

After the line 7 of the algorithm, the array C becomes:

0 1 2 3 4 5

The following three steps show how the numbers A[8], A[7], and A[6] are placed in array B and how the array C is updated after each step.

1 2 3 4 5 6 7 8

0 1 2 3 4 5

1 2 3 4 5 6 7 8

0 1 2 3 4 5

1 2 3 4 5 6 7 8

0 1 2 3 4 5

The final result is

1 2 3 4 5 6 7 8

The complexity of the counting sort is obviously O(n + k) = O(n) since each loop in the algorithm takes either O(n) steps or (k) steps.

4.3Radix Sort

Assume each input number has d digits and each digit takes on one of k possible values. The Radix Sort sorts the numbers digit by digit from least significant (rightmost) digit to the most significant (leftmost) digit.

Radix-Sort (A, d)

1for i 1 to d

2do use a stable sort to sort array A on i th digit.

3End

Example

329 720 720 329 457 355 329 355 657 436 436 436 839

457 839 457

436 657 355 657 720 329 457 720 355 839 657 839 Theorem 8.3 Given n d-digit numbers in which each digit can take one of k possible values, Radix-Sort correctly sort them in O(d(n+k)) time.

Proof. We prove the correctness of the Radix-Sort first. Let the n numbers be stored in the array A[1..n]. We define two functions:

F i(x) = digit i of x,

G i(x) = rightmost i digits of x.

For example, F2(2431) = 3, G2(2431) = 31.

Let A i[1..n] be the n numbers stored in array A after i digits have been sorted by the Radix-Sort.

We will show that if 1 ≤u < v≤n, then

G i(A i[u]) ≤ G i(A i[v]) for any i.

We prove this by induction.

When i = 1, G i(A i[u]) ≤G i(A i[v]) is obviously true because A1[1..n] are the n numbers of A[1..n], but sorted by the rightmost digit.

Suppose G i(A i[u]) ≤ G i(A i[v]) is true.

We prove G i+1(A i+1[u]) ≤ G i+1(A i+1[v]) is true also.

Because A i+1[1..n] is obtained from A i[1..n] by sorting the digit (i+1), we have:

F i+1(A i+1[u]) ≤ F i+1(A i+1[v]).

If F i+1(A i+1[u]) < F i+1(A i+1[v]),

then G i+1(A i+1[u]) < G i+1(A i+1[v]), and we are done. Otherwise,

F i+1(A i+1[u]) = F i+1(A i+1[v]).

Because the sorting on digit (i+1) is stable, the number A i+1[u] occurred in front of the number A i+1[v] in the array A i[1..n]. Therefore, G i(A i+1[u]) ≤ G i(A i+1[v]), which implies that

G i+1(A i+1[u]) ≤ G i+1(A i+1[v]).

The complexity of the Radix-Sort is O(d(n+k)) because we can use counting sort to sort each digit in O(n+k

4.4Bucket Sort

Bucket Sort is another non-comparison sorting. For the Bucket Sort, we assume the n input numbers in A[1..n] are within the interval between 0 and 1, 0 ≤A[i] <1, 1 ≤i≤n. Moreover, we divide the interval [0, 1) into n equal-sized subintervals called buckets.

Then, the n numbers are distributed among the n buckets.

Because

0 ≤ A[i] <1, 1 ≤i≤n,

we have

0 ≤n A[i] < n

and

0 ≤?n A[i]? < n.

Therefore, we place A[i] in bucket j if ?n A[i]? = j.

After the distribution, we sort numbers in each bucket, and concatenate the numbers in the n buckets in order.

Bucket-Sort (A[1..n])

1for i← 1 to n

2do { j←?n A[i]?

3insert A[i] into list B[j]

5for i← 0 to n - 1

6do sort list B[i] with insertion sort

7concatenate the lists B[0], B[1], …, B[n-1] in order.

8End

Example.

1 0

2 1

3 2

4 3

5 4

6 5

7 6

8 7

9 8 10 9 (a) (b)

Complexity of the Bucket Sort

T(n ) = Θ(n ) + ∑-=1

0n i O(n i 2),

where ∑-=1

n i n i = n .

(1)

Because n 2 = (∑-=10

n i n i )2 = ∑-=10

n i n i 2 + 2∑≠j

i n i n j ,

We have

T(n ) = Θ(n ) + ∑-=1

n i O(n i 2) = O(n 2).

This is the worst case complexity.

It can be improved to O(n lg n ). (Exercise 8.4-2).

Now, we prove that the average time is O(n ).

We compute the expectation of (1):

E[T(n )] = E[Θ(n ) + ∑-=1

0n i O(n i 2)] = Θ(n ) + ∑-=10n i E[O(n i 2)]

= Θ(n ) + ∑-=10n i O(E[n i 2]).

We will show that E[n i 2] = 2 - n

Let X ij be the random variable such that

X ij = 1 if A[j ] falls in bucket i . X ij = 0 otherwise.

So, E[n i ] = ∑=n

j 1X ij .

E[n i 2] = E[(∑=n j 1

X ij )2] = E[(X i 1 + X i 2 + … + X in )2]

= E[∑=n

j ij X 12

+ ∑

∑≤≤≠≤≤n j j

k n

k 11 X ij X ik ] = ∑=n

j 1

E[X ij 2] + ∑

∑≤≤≠≤≤n j j

k n

k 11E[X ij X ik ].

We can assume that X ij and X ij are independent.

Moreover, Pr[X ij ] = n 1, we have E[X ij 2] = n 1(12) = n

1. We have

E[n i 2] = ∑=n j 1n

1 + ∑∑≤≤≠≤≤n j j

k n k 11(n 1) 2

= 1 + n (n -1) (n

1) 2

= 2 - n

Therefore,

E[T(n )] = Θ(n ) + ∑-=10n i O(E[n i 2]) = Θ(n ) + ∑-=1

n i O(2 - n 1

)

= Θ(n ).