Divide-and-Conquer Algorithms |
Document last modified: |
Divide - break the problem into several subproblems that are similar to the original problem but smaller in size
Conquer - solve the subproblems recursively.
- Base case: If the subproblem size is small enough (i.e., the base case has been reached) then solve the subproblem directly without more recursion.
Combine - the solutions to create a solution for the original problem
Analysis of recursive algorithms
Iterative algorithms can often be analyzed by counting instruction executions for some problem size n, as was done for the Insertion sort.
Recursive algorithms solve a problem based on recurring solution(s) to progressively smaller subproblem(s). Accurate analysis requires an accounting of this recurrence.
recurrence equation - describes the running time of a problem of size n of an algorithm that contains a recursive call to itself, in terms of the running time on smaller inputs of the same problem type.
T(n) = recurrence equation defining the running time on problem of size n
Binary Search - a degenerate divide-and-conquer search algorithm,
no combine phase.
The algorithm definition in English:
|
|
Question 2.23
|
|||||||||||||||||||||||||||||
| Binary Search tree
|
Analyzing binary search
- For simplicity, assume that problem size n is power of 2 and each divide step yields subproblems of size n/2.
- The number of iterations of
N := N/2
until N = 1 is the worst-case performance (i.e. don't find value).
Example
N := 16 while N > 1
N := N / 2
N := 16/2 = 8 N := 8/2 = 4
N := 4/2 = 2
N := 2/2 = 1
- 16/16 = 16/24 = 1
- implies a k such that
N/2k = 1
k is the number of divisions (i.e. iterations) until N = 1.
N/2k = 1
N = 2k
log2 N = log2 2k = k
k = log2 N
- The base case occurs when n = 1; that is mid=low=high.
- When n ≥ 2, times for binary search steps:
- Divide (lines 1-4): Compute mid as the average of low and high:
D(n) = Θ(1)
- Conquer (lines 5 or 6): Recursively solve one subproblem of size n/2:
T(n/2)
- Combine: There is no combine step
BinarySearch (A, key, low, high) 1 if (low > high) return -1; 2 int mid=(low+high)/2; 3 if (key == A[mid]) return mid; 4 if (key > A[mid]) 5 return BinarySearch(A, key, mid+1, high); 6 else return BinarySearch(A, key, low, mid-1);
- Binary Search's Recurrence is then:
| D(n) if n = 1, the base case
T(n) = |
| T(n/2) + D(n) if n > 1, the recursive caseor equivalently
| Θ(1) if n = 1, the base case
T(n) = |
| T(n/2) + Θ(1) if n > 1, the recursive caseQuestion 2.24.1 Explain in English what is meant by either of the two recurrences above.
Solving the binary search recurrence
T(n) = T(n/2)
- For Binary Search the closed solution is: T(n) = Θ (lg n)
- We can then determine the run time of Binary Search to solve a problem of size n=16
lg 16 = lg 24 = 4
requires a maximum of 4 executions.
BinarySearch (A, key, low, high) 1 if (low > high) return -1; 2 int mid=(low+high)/2; 3 if (key == A[mid]) return mid; 4 if (key > A[mid]) 5 return BinarySearch(A, key, mid+1, high); 6 else return BinarySearch(A, key, low, mid-1); Recursion tree
- For example above, Binary Search produces the possible recursions
- We can draw the recursion tree of the recurrence to give us insight into why cost is: Θ (lg n)
- Assume the cost of testing for the key, dividing the problem into 2 subproblems and determining which subproblem to solve (lines 1-4) requires a constant time c
- Can rewrite the recurrence
| Θ(1) if n = 1, the base case
T(n) = |
| T(n/2) + Θ(1) if n > 1, the recursive caseas:
| c if n = 1
T(n) = |
| T(n/2) + c if n > 1Drawing the recurrence as a tree
| T(n) | Level 0 cost: c to divide problem
|
Only 1 subproblem solved at each level, cost: c to divide problem
|
||||||||||||||||||
|
Only 1 subproblem solved at each level, cost: c lg n height tree has lg n + 1 levels.
|
||||||||||||||||||||
Each level has cost c.
Hypothesis that there are lg n + 1 levels (height is lg n) for problem size n.
|
||||||||||||||||||||
Total cost is sum of costs at each level.
Have lg n +1 levels, each costing c so total cost:
c*(lg n +1) = c lg n + c.
Ignore low-order term of c and constant coefficient c, can indicate cost:
T(n) = Θ(lg n).
Merge_Sort - a divide-and-conquer sorting algorithm with a combine phase.
Merge_Sort algorithm definition is essentially:
- If vector length = 1 the result is sorted.
- If vector length > 1
Divide vector into 2 halves and execute Merge_Sort on each halve.
Merge to combine the sorted halves, result is sorted.
Merge algorithm definition is essentially:
- Copy array A elements making up the two halves to merge to array L (left) and R (right).
- Place a sentinel, ∞, as the last element value of L and R to indicate the end of each.
- Merge all elements from L and R (up to ∞) to A in ascending order (for this example).
|
Example
Merge_Sort(A, 9, 16) result
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
Question 2.24.2
What is the order of Merge?
Merge( A, 9, 12, 16)
![]() |
![]() |
Sort/Merge illustrations
|
Array size 2n Balanced tree |
Array size not 2n Not balanced tree |
Analyzing Merge_Sort
Merge_Sort (A, p, r)
-- alters A
-- post: A[p..r] is sorted
-- ∀m; p ≤ m <r ; A[m] ≤ A[m+1]1 if p < r then Run Time 2 q ← floor ((p + r) / 2) D(n) = Θ(1) 3 Merge_Sort (A, p, q) T(n/2) 4 Merge_Sort (A, q + 1, r) T(n/2) 5 Merge (A, p, q, r) C(n) = Θ(n) recurrence equation - describes the running time of a problem of size n of an algorithm that contains a recursive call to itself, in terms of the running time on smaller inputs of the same problem type.
T(n) = recurrence equation defining the running time on problem of size n
- For simplification, assume n is a power of 2 (i.e. 2n ) so that each divide step yields 2 subproblems of size n/2.
- The base case occurs when n = 1 or p ≥ r.
- When n ≥ 2, time for merge sort steps:
- Divide: Just compute q as the average of p and r:
D(n) = Θ(1)
- Conquer: Recursively solve 2 subproblems, each of size n/2:
T(n/2) + T(n/2) = 2T(n/2)
- Combine: MERGE on an n-element subarray takes Θ(n) time:
C(n) = Θ(n)
- Since D(n) = Θ(1) and C(n) = Θ(n)
- summed together they give a function that is linear in n:
D(n) + C(n) = Θ(1) + Θ(n) = Θ(n)
- Merge_Sort's Recurrence is:
| c if n = 1, Base case
T(n) = |
| 2T(n/2) + D(n) + C(n) if n > 1, Recursive caseor equivalently
| Θ(1) if n = 1
T(n) = |
| 2T(n/2) + Θ(n) if n > 1Question 2.25 Explain in English what is meant by either of the two recurrences above.
Analyzing Divide-and-Conquer in general
|
| Θ(1) if n
≤ c,
Base case T(n) = | | a * T(n/b) + D(n) + C(n) if n > c, Recursive case |
- a - is the number of subproblems
- 1/b - each subproblem is 1/b of the size of the original problem
- when n <= c, then the problem can be solved directly
- D(n) - time to Divide the original problem into a subproblems
- C(n) - time to combine the solutions to the a subproblems
| Θ(1) if n = 1
T(n) = |
| 2T(n/2) + Θ(n) if n > 1
- a = 2
- b = 2
- D(n) = Θ(1) divide step
- C(n) = Θ(n) combine step
Solving the Merge_Sort recurrence
T(n) = 2T(n/2) + Θ(n)
For Merge_Sort the closed solution is:
T(n) = Θ(n * log2 n)
Later we'll use the Master Method for solving recurrence relations.
We can draw the recursion tree to give us insight into why it is Θ(n * log2 n)
Recursion tree
|
| c if n = 1 T(n) = | | 2T(n/2) + cn if n > 1 |
where c is a constant that describes the running time for the base
case and the time per array element for the divide and conquer steps.
|
Start with cost:
|
For each of size n/2 subproblems
|
|
|
Continue expanding until problem sizes are 1
|
||
Question 2.26 For n = 16 in Merge-Sort
- Each recursion divides the problem size by 2. How many divisions before the problem size is 1?
- What is the height of the tree?
- How many levels? Does that agree with the diagram below of sorting 16 elements?
- State the height as a function of the problem size, 16.
- How many sub-problems of size 1?
- Each recursion doubles the number of sub-problems. State the number of sub-problems at the bottom of this tree as a function of the problem size, 16, and height.
Prove: Cost of recursion tree is cn lg n
+ cn
Each level has cost cn.
Hypothesis that for problem size n, there are lg n + 1 levels (height is lg n).
|
Total cost is sum of costs at each level.
lg n +1 levels, each costing cn so total cost:
cn * (lg n + 1) = cn lg n + cn
Ignore low-order term of cn and constant coefficient c, cost:
Θ(n lg n)
Can now state a reasonable guess that:
T(n) = Θ(n lg n)
We'll prove this later in Chapter 4.
Merge_Sort solution of size n problem grows as n lg n grows, not n (linear) but slower than n2.