Divide-and-Conquer Algorithms


Document last modified: 

Divide - break the problem into several subproblems that are similar to the original problem but smaller in size

Conquer - solve the subproblems recursively. 

Combine - the solutions to create a solution for the original problem

Analysis of recursive algorithms

Iterative algorithms can often be analyzed by counting instruction executions for some problem size n, as was done for the Insertion sort.

Recursive algorithms solve a problem based on recurring solution(s) to progressively smaller subproblem(s). Accurate analysis requires an accounting of this recurrence.

recurrence equation - describes the running time of a problem of size n of an algorithm that contains a recursive call to itself, in terms of the running time on smaller inputs of the same problem type.

T(n) = recurrence equation defining the running time on problem of size n


Binary Search - a degenerate divide-and-conquer search algorithm, no combine phase.

Searches for a key in a sorted vector, returning the index where the key was found or -1 when not found.

Reduces the problem size by half each recursion.

The algorithm definition in English:

  • Divides sorted vector into 2 halves.
  • The lower half contains values less or equal the key and the higher half values greater than or equal the key.
  • If the low index exceeds the high index the key is not in the vector.
  • Compute the middle index of the vector.
  • If the key equals the value at the middle of the vector, the index of the middle is returned
  • If the key is less than the value at the middle of the vector, the lower half is searched
  • If the key is greater than the value at the middle of the vector, the higher half is searched
  • Only one of the halves is searched, reducing the problem size by half each time.
BinarySearch (A, key, low, high)
-- preserves A, key
-- alters low, high
-- pre: A[] is sorted, low and high in {0..A.length-1}
-- post: result == index where key==A[index] else result == -1
1 if (low > high) return -1;
2 int mid = (low+high)/2;
3 if (key == A[mid]) return mid;
4 if (key > A[mid])
5           return BinarySearch(A, key, mid+1, high);
6 else    return BinarySearch(A, key, low, mid-1);

Question 2.23

  1. Trace execution for key 22 for the call: BinarySearch(A, 22, 0, 14).
    low high mid
         
         
         
         
  2. How many executions of BinarySearch to find 22?
  3. For n=16 (24) 4 executions are required. For n=32 (25) 5 executions are required.
    How many executions for problem size n=2i.

    Recall: logaab = b

    lg 2b = log22b = b.

  4. What is the growth rate?
Binary Search tree

Analyzing binary search

BinarySearch (A, key, low, high)
1 if (low > high) return -1;
2 int mid=(low+high)/2;
3 if (key == A[mid]) return mid;
4 if (key > A[mid])
5           return BinarySearch(A, key, mid+1, high);
6 else    return BinarySearch(A, key, low, mid-1);
           | Θ(1)                   if n = 1, the base case
T(n) = |
           |
T(n/2) + Θ(1)     if n > 1, the recursive case

Question 2.24.1 Explain in English what is meant by either of the two recurrences above.

 

Solving the binary search recurrence

BinarySearch (A, key, low, high)
1 if (low > high) return -1;
2 int mid=(low+high)/2;
3 if (key == A[mid]) return mid;
4 if (key > A[mid])
5           return BinarySearch(A, key, mid+1, high);
6 else    return BinarySearch(A, key, low, mid-1);

Recursion tree

Drawing the recurrence as a tree

T(n) Level 0 cost: c
to divide problem

Only 1 subproblem solved at each level, cost: c
to divide problem

Only 1 subproblem solved at each level, cost: c

lg n height tree has lg n + 1 levels.

Each level has cost c.
  • The top level has cost c.
  • The next level down has 2 possible subproblems, only one solved contributing cost c.
  • The next level has 4 possible subproblems, only one solved contributing cost c.
  • At each lower level, the number of possible subproblems doubles but only one is solved as the cost per subproblem (level) of c.

Hypothesis that there are lg n + 1 levels (height is lg n) for problem size n.

  • Use induction to prove.
    • Inductive hypothesis:

      A tree for a problem size of n=2i has:

      lg n + 1 = lg 2i+1 = i + 1 levels

      Recall that logaab = b

    • Base case:

      For n = 1 then 1 level, the root at level 0, and:

      lg n + 1 = lg 2i+1 = lg 20+1 = 0 + 1 = 1.

    • Assumed that the problem size is a power of 2
       
    • Induction Hypothesis

       lg 2i + 1 = i + 1 levels.

    • Next problem size after 2i is 2i+1
       
    • Show that a tree for a problem size of 2i+1 has one more level than the size 2i tree:

     lg 2i+1 + 1 = i + 2 levels.

    • Proof
lg 2i + 1 = i + 1 Induction Hypothesis
lg 2i+1 + 1 = lg (2i*21) + 1 Multiplication Law of Exponents
  = lg 2i + lg 21 + 1 Logarithm of a Product
  = (lg 2i + 1) + 1 lg 21 = 1
  = (i + 1) + 1 Induction Hypothesis: n=2i has i+1 levels
  = i + 2 ∴ n=2i+1 has i+2 levels

Total cost is sum of costs at each level.

Have lg n +1 levels, each costing c so total cost:

c*(lg n +1) = c lg n + c.

Ignore low-order term of c and constant coefficient c, can indicate cost:

T(n) = Θ(lg n).

 

Merge_Sort - a divide-and-conquer sorting algorithm with a combine phase.

Merge_Sort algorithm definition is essentially:

Merge algorithm definition is essentially:

Merge (A, p, q, r)
-- alters A
-- preserves p, q, r
-- pre: p ≤ q < r  &  A[p..q] & A[(q + 1)..r] sorted
-- post: A[p..r] sorted
--         ∀m; p≤m<r; A[m]≤A[m+1]
1 n1 ← q - p + 1
2 n2 ← r - q
3 -- Copy L[1..n1]←A[p..q]
--         R[1..n2]←A[q+1..r]
4 for i ← 1 to n1 do
5    L[i] ← A[p + i - 1]
6 for j ← 1 to n2 do
7    R[j] ← A[q + j]
8 L[n1 + 1] ← ∞
9 R[n2 + 1] ← ∞
10 i ← 1
11 j ← 1
  -- ∀m; p ≤ m < k-1; A[m] ≤ A[m+1]
12 for k ← p to r do
13    if L[i] ≤ R[j] then
14       A[k] ← L[i]
15       i ← i + 1
16    else
17       A[k] ← R[j]
18       j ← j + 1
Merge_Sort (A, p, r)
-- alters A
-- post: A[p..r] is sorted
--   ∀m; p ≤ m < r; A[m] ≤ A[m+1]
1 if p < r then
2    q ← floor ((p + r) / 2)
3    Merge_Sort (A, p, q)
4    Merge_Sort (A, q + 1, r)
5    Merge (A, p, q, r)

Example

Merge_Sort(A, 9, 16)

Merge_Sort(A, 9, 12)

Merge_Sort(A, 13, 16)

Merge(A, 9, 12, 16)

Merge_Sort(A, 9, 16) result

Question 2.24.2

What is the order of Merge?

Merge( A, 9, 12, 16)

 Sort/Merge illustrations

Array size 2n

Balanced tree

Array size not 2n

Not balanced tree

  

Analyzing Merge_Sort

Merge_Sort (A, p, r)
-- alters A
-- post: A[p..r] is sorted
--         ∀m; p ≤ m <r ; A[m] ≤ A[m+1]
1 if p < r then                                     Run Time
2    q ← floor ((p + r) / 2) D(n) = Θ(1)
3    Merge_Sort (A, p, q) T(n/2)
4    Merge_Sort (A, q + 1, r) T(n/2)
5    Merge (A, p, q, r) C(n) = Θ(n)

recurrence equation - describes the running time of a problem of size n of an algorithm that contains a recursive call to itself, in terms of the running time on smaller inputs of the same problem type.

T(n) = recurrence equation defining the running time on problem of size n

           | Θ(1)                   if n = 1
T(n) = |
           |
2T(n/2) + Θ(n)   if n > 1

Question 2.25 Explain in English what is meant by either of the two recurrences above.

 

Analyzing Divide-and-Conquer in general

           | Θ(1)                   if n = 1
T(n) = |
           |
2T(n/2) + Θ(n)   if n > 1

Solving the Merge_Sort recurrence

Recursion tree

Start with cost:

T(n) = 2T(n/2) + cn

For each of size n/2 subproblems

T(n/2) =  2T(n/4) + cn/2

Continue expanding until problem sizes are 1

Question 2.26    For n = 16 in Merge-Sort

  1. Each recursion divides the problem size by 2. How many divisions before the problem size is 1?

  2. What is the height of the tree?
     
  3. How many levels? Does that agree with the diagram below of sorting 16 elements?

  4. State the height as a function of the problem size, 16.

  5. How many sub-problems of size 1?

  6. Each recursion doubles the number of sub-problems. State the number of sub-problems at the bottom of this tree as a function of the problem size, 16, and height.

 

 

Prove: Cost of recursion tree is cn lg n + cn

We can then use as a guess that: T(n) = Θ(n * log2 n)

Each level has cost cn.

  • The top level has cost cn, cost to merge solutions.
    • Total cost at this level = cn
  • The next level down has 2 subproblems, each contributing cost cn/2.
    • Total cost at this level = 2*(cn/2) = cn
  • The next level has 4 subproblems, each contributing cost cn/4.
    • Total cost at this level = 4*(cn/4) = cn
  • At each lower level, the number of subproblems doubles but the cost per subproblem halves, sum of the cost per level stays the same.

Hypothesis that for problem size n, there are lg n + 1 levels (height is lg n).

  • Base case:

    n = 1 then 1 level, and lg 1 + 1 = 0 + 1 = 1.

  • Inductive hypothesis is that a tree for a problem size of n=2i has:

    lg 2i+1 = i+1 levels.

  • Because we assume that the problem size is a power of 2, the next problem size up after 2i is 2i+1
     
  • A tree for a problem size of 2i+1 has one more level than the size 2i tree:

     i + 2 levels.

    lg 2i+1 + 1 = i + 2

 

Total cost is sum of costs at each level.

lg n +1 levels, each costing cn so total cost:

cn * (lg n + 1) = cn lg n + cn

Ignore low-order term of cn and constant coefficient c, cost:

Θ(n lg n)

Can now state a reasonable guess that:

T(n) =  Θ(n lg n)

We'll prove this later in Chapter 4.

 

Merge_Sort solution of size n problem grows as n lg n grows, not n (linear) but slower than n2.