# Chapter 16 - Greedy Algorithms

Overview

Greedy algorithms are an approach to solving certain kinds of optimization problems.

Greedy algorithms are similar to dynamic programming algorithms in that the solutions are both efficient and optimal if the problem exhibits some particular sort of substructure.

A greedy algorithm builds a solution by going one step at a time through the feasible solutions, applying a heuristic to determine the best choice.

A heuristic applies an insight to solving the problem, such as always choose the largest, smallest, etc.

Elements of the greedy strategy

Greedy algorithms should be applied to problems exhibiting these two properties:
• greedy choice property: a globally optimal solution can be arrived at by making locally optimal (greedy) choice.

When a locally optimal choice can lead to a globally optimal solution, the problem has this property.

Applied to other problems that don't have this property, such as the Traveling (now tired) Salesman Problem, greedy algorithms may not find the optimal solution, but might find one that is "good enough" for some applications, especially when the alternative is an exponential-time brute force solution.

• optimal substructure: As in dynamic programming. An optimal solution to a problem contains within it an optimal solution to subproblems.

Example: Recall D˙kstra's Algorithm:

finding the shortest path (z, x) was a subproblem of finding the shortest path (s, x)

Heuristic - a criteria used to determine the choice most likely to lead to an optimal solution.

A greedy algorithm applies the heuristic to make the best choice at that moment.

• Recall the knapsack problem; the heuristic was to always use the maximum value for a given capacity.

• For Dykstra's Algorithm; the heuristic was to always choose the shorter of two paths.

In both cases, the local use of the heuristic led to a globally optimal solution.

• Shortest path used the RELAX heuristic: choose smallest cost to reach a vertex.

• locally optimal when better solution can occur elsewhere.

• globally optimal when the solution is the best possible.

Solution space - the set of all possible solutions.

Sorts have only one valid solution but n! possible solutions (permutations) in solution space to consider.

Making change for \$1 has many valid solutions (100 pennies, 20 nickels) but only one solution is optimal for the fewest coins (2 half dollars).

Example: The solution space for all 3-bit binary numbers could be represented by a cube, at right, where each vertex corresponds to one of the numbers.

Searching for the maximum 3-bit value could be done by starting at an arbitrary vertex and, by some means (a heuristic), determining which vertex to examine next.

A greedy algorithm heuristic might always choose the largest valued adjacent vertex in a maximization search.

Question 16.1

1. Starting at vertex 100, apply a greedy search using the heuristic: "choose the largest adjacent vertex, stop if all are smaller" to find the maximum 3-bit value on the upper-right figure.

2. Did a locally optimal choice lead to a globally optimal solution?

3. Repeat the method starting at vertex 000.

4. Did a locally optimal choice lead to a globally optimal solution?

5. Does this problem have the greedy-choice property, a globally optimal solution can be arrived at by making locally optimal (greedy) choice?

Huffman codes for optimal data compression

Why do Huffman codes matter?

From Algorithms by Dasgupta, Papdimitriou and Vazirani.

In the MP3 compression scheme, a sound signal is encoded in three steps.

1. Analog signal is digitized by sampling at regular intervals yielding a sequence of real numbers s1, s2, ..., sT.

The figure at right illustrates encoding an analog 0 to 1 signal as a 0002 to 1112 value; where 0.5 to 0.625 is encoded as 1002.

The figure below illustrates the encoding of an analog signal (wavy curve) to a discrete value (bar).

At a sampling rate of 44,100 samples per second, a mono-channel 50-minute performance would correspond to T=50*60*44,100 ≈ 130,000,000 measurements.

2. Each sT sample is quantized, approximated to a nearby number from a finite set Γ.

This set is carefully chosen to exploit human perceptual limitations with the intention that the approximating sequence is indistinguishable by the human ear from the original s1, s2, ..., sT sequence.

3. The resulting string of length T over the alphabet Γ is encoded in binary.

Example

Characters are commonly represented by the same number of bits, e.g., the 7-bit ASCII code.

However, some characters occur more frequently in English (or in any languages with alphabets) than others.

Using a variable number of bits for a code such that frequent characters use fewer bits and infrequent character use more bits, we can decrease the space needed to store the same information.

For example, consider the following sentence:

`"dead beef cafe deeded dad.  dad faced a faded cab.  dad acceded.  dad be bad."`
`Frequencies for the 77 characters are:`
 a b c d e f space . 12 4 5 19 12 4 17 4
`Question 16.2 - Which characters of the sentence should we encode in the fewest bits for compression?`
`Example encoding`
 If we use a fixed-length code like this: ```000 (space) 001 a 010 b 011 c 100 d 101 e 110 f 111 .``` `The sentence, which is of length 77, consumes 77 * 3 = 231 bits.` But if we use a variable length code like this: ```100 (space) 110 a 11110 b 1110 c 0 d 1010 e 11111 f 1011 .``` `Then we can encode the text in:` `3*12+4*5+5*4+19*1+12*4+4*5+17*3+4*4 = 230 bits.`

A savings of 1 bit. Not much, but a start.

Prefix code - Note that such a code must be a prefix code, where we can distinguish where one code stops and another starts; one code may not be a prefix of another code or cannot easily determine one code from another.

Example: Because of the encoding d=0, no other codes can have the prefix 0, all others must start with a 1.

If the characters have non-uniform frequency distributions, then finding such a code can lead to great savings in storage space.

A process with this effect is called data compression.

This can be applied to any data where the frequency distribution is known or can be computed, not just sentences in languages.

Examples: computer graphics, digitized sound, binary executables, etc.

Optimal data compression yields savings from 25% to 90%.

There are many possible codes, but would like to find one that is optimal with respect to the number of bits needed to represent the data.

How can we find such a code?

Huffman's Algorithm is a greedy algorithm that constructs an optimal encoding.

Example - an optimal encoding of 100,000 characters

 Left tree not optimal using fixed 3-bit length.    f encoding is 101 or right, left, right. Right tree is optimal.    f encoding is 1100 or right, right, left, left.    Prefix code, no codeword is a prefix of another. 300,000 bits fixed-length 3-bit codeword204,000 bits variable-length codeword Non-optimal                                   Optimal

Using fixed-length encoding:    abc = 001010011

Using optimal encoding:           abc = 0101100

Constructing a Huffman code

1. Insight - lower frequency characters should have longer bit encoding (deeper depth);

start at bottom with lowest frequency characters,

recursively adding the lowest remaining characters.

2. smaller code (highest frequency) is left branch 0,

larger is right branch 1

3. greedy strategy (heuristic) - extract two values from a character frequency min-heap, guaranteed to be the minimum remaining.

4. Diagram lists the minimum codes of any two characters on left branch but could be placed on right at same level with same cost.

Question 16.3 From above Figure (b), why are codes for a=0 and f=1100?

Greedy strategy - why it works

greedy choice property: a globally optimal solution can be arrived at by making locally optimal (greedy) choice. A necessary condition.

1. Greedy heuristic: extract two lowest frequency values from a character frequency min-heap, guaranteed to be the minimum remaining.

2. Combine the two into a subtree, summing the frequencies.

The two lowest frequency character subtrees will start and stay at leaf of code tree.

3. Insert subtree into min-heap.

4. Each time the lowest two values are extracted from min-heap and combined, guaranteeing the lowest frequencies are at bottom of tree and the highest at the top.

5. The highest frequency will have shortest code (length) while the lowest will have the longest.

Greedy choice property holds because the minimum subtree is selected from a min-heap and placed above lower frequency subtrees.

Example - Don't confuse the min-heap, which maintains the priority queue, with the tree constructed containing the Huffman codes.

min-heap

Label each node of the min-heap with the frequency of the letter in the text to be compressed.

This quantity will be called the "value" of the node.

The frequencies may be known beforehand from studies of the language or data, or can be computed by just counting characters.

Initial min-heap used to identify minimum character and its frequency as:

character:frequency

min-heap

Extract 2 minimum frequency subtrees from min-heap, 5 and 9.

Sum frequencies at parent and insert subtree into min-heap.

Min-heap now contains subtree with frequency 14 at parent and children f and e .

Extract 2 minimum frequency entries from min-heap, 12 and 13.

Sum frequencies at parent node and insert into min-heap.

Extract 2 minimum frequency entries from min-heap, 14 and 16.

Sum frequencies at parent node and insert into min-heap.

Extract 2 minimum frequency entries from min-heap, 25 and 30.

Sum frequencies at parent node and insert into min-heap.

Frequency 45 and 55 remain.

Extract 2 minimum frequency entries from min-heap, 45 and 55.

Sum frequencies at parent node and insert into min-heap.

The last iteration inserts the root of the encoding into the min-heap.

The root node has value 100, the total number of characters.

When finished, min-heap contains one entry, the character encoding tree.

Algorithm

The number of bits needed to encode the data is the the sum, for each character, of the number of bits in its code times its frequency.

Let T be the tree

C   the set of characters c that comprise the alphabet

c.freq   the frequency of character c.

dT(c)   the depth of character c in the tree

Since the number of bits is the same as the depth in the binary tree, we can express the sum in terms of:

B(T)=c.freq dT(c)
c C

The sum we want to minimize, called the cost, B(T) of the tree.

In the following algorithm, freq is defined as above; it can be stored efficiently in an array indexed by characters.

freq is extended as needed to accommodate the values of internal tree nodes.

C  the set of characters represented as leaf tree nodes.

Priority queue Q of tree nodes extracts the minimum element with a min-heap.

Build the tree in a bottom up manner, starting with the individual characters and ending up with the root of the tree as the only element of the queue:

 ```Huffman (C) -- pre: C is the characters that comprise the alphabet -- post: return the root of the optimal encoding tree ``` ``` n ← |C| Q ← C -- insert all the elements of C into Q, using the frequency as the priority for i ← 1 to n-1 do z ← a new tree node x ← Extract-Minimum (Q) y ← Extract-Minimum (Q) z.left ← x z.right ← y z.freq ← x.freq + y.freq Insert (Q, z) return Extract-Minimum (Q) -- the root of the encoding tree ```

At first, the min-queue contains all the leaf nodes as a "forest" of singleton binary trees.

Then the two nodes with least value are extracted from the queue, joined by a new tree node, and inserted into queue.

After n-1 iterations, there is only one node left in the queue: the root of the tree.

Question 16.4.1

Where's the greed?

Execute Huffman( C ) on the following:

 c f . b c a e space d c.freq 4 5 6 8 12 13 17 19

What's the code for: d and f.

Example

Let's go through the earlier example using Huffman's algorithm. Below are the contents of Q after each step through the for loop:

1. `Initially, all nodes are leaf nodes. Insert all 8 in Q: `
 c f . b c a e space d c.freq 4 4 4 5 12 12 17 19
2. `Join the two nodes with least value; now there are 7 things in Q: `
 c b c a e space d c.freq 4 5 8  /  \ f     . 4    4 12 12 17 19
3. `Then the next two with least value, Q has 6 elements: `
 c a e space d c.freq 8  /  \ f     . 4    4 9  /  \ b    c 4    5 12 12 17 19
4. `Now the two nodes with least values are the two trees just made, and Q has 5 elements: `
 c a e space d c.freq 12 12 17      /    \   8        9  /  \     /  \ f     .  b    c 4    4 4    5 17 19
5. `Q has 4 elements: `
 c space d c.freq 17      /    \   8        9  /  \     /  \ f     .  b    c 4    4 4    5 17 19 24   /  \  a    e 12  12
6. `Three items left: `
 c d c.freq 19 24  /  \ a    e 12  12 34           /    \       17      space      /    \       17   8        9  /  \     /  \ f     .  b    c 4    4 4    5
7. `Two big trees left: `
 c c.freq 34           /    \       17      space      /    \       17   8        9  /  \     /  \ f     .  b    c 4    4 4    5 43      /    \     d     24    19    /  \          a    e         12  12
8. `Finally, join the whole thing: `
 c c.freq 77                   /           \            34              43           /    \           /    \       17      space  d     24      /    \       17   19   /   \   8        9                 a    e  /  \     /  \               12  12 f     .  b    c 4    4 4    5
`At each point, the heuristic chooses the joining that would force less frequent characters to be deeper in the tree. `
 `So an optimal prefix code is: ` ```01 (space) 110 a 0010 b 0011 c 10 d 111 e 0000 f 0001 .``` `And B(T) = 17*2+12*3+4*4+5*4+19*2+12*3+4*4 = 196 bits.` ```A savings of 15% in the size of the data. ``` Our initial variable length code: ```100 (space) 110 a 11110 b 1110 c 0 d 1010 e 11111 f 1011 .``` `can encode the same text in:` `3*12+4*5+5*4+19*1+12*4+4*5+17*3+4*4 = 230 bits.`

Divide-and-conquer

Naturally divides problems into subproblems.

When one subproblem has a better solution than another, can construct greedy divide-and-conquer algorithm.

We see later, this is an important technique in multithreaded (parallel) algorithms.

Question 16.4.2

Divide-and-conquer can determine the maximum of an array by:

1. dividing the array into two halves

2. recursively finding the maximum of each half

3. choosing the maximum of the two halves

Give an algorithm to find the maximum of an array using divide-and-conquer.

Hint: the middle of the array with starting index s and ending index e is:

mid = ⌊( s + e ) / 2⌋

Is it really greedy? No.

Though the solution to the problem is in the subproblems, all subproblems are solved.

There is no choice about which subproblem to solve. Ideally greed would allow non-optimal subproblems to be identified and ignored.

Discrete Knapsack - no fractional items

We saw that dynamic programming yielded an optimal solution in time O(nW).

Greed does not always yield an optimal solution, only when the greedy choice property holds.

Consider the discrete knapsack problem using the greedy strategy:
• always choose the item with the maximum value/weight ratio that will fit in the sack.

There are exactly three items.

The sack can carry 50 pounds of weight.

We want as much value as the sack can carry.

The optimal solution Item 2 and 3=\$100+\$120=\$220, see (b) above.

Item 1 would be chosen under the greedy strategy because it is the maximum value/weight ratio of 6.

Any solution with Item 1 is suboptimal.

 item 1 2 3 weight 10 20 30 value \$60 \$100 \$120 value weight 6 5 4

Fractional Knapsack

Value/weight ratio:

item 1 = 60/10 = 6

item 2 = 100/20 = 5

item 3 = 120/30 = 4

Discrete knapsack fails in Figure (b) using greedy strategy

"always choose the item with the maximum value/weight ratio that will fit in the sack"

choosing items 1 and 2 = \$160 rather than items 2 and 3 = \$220.

If we can take a fractional part of an item, as in Figure (c) where we take 20/30 of item 3 (20 pounds), the greedy strategy

"always choose the item with the maximum value/weight ratio that will fit in the sack"

yields an optimal solution.

Question 16.5 - Why does the strategy fail to yield an optimal in the discrete but does in the fractional?

Answer: Because in the discrete case, the greedy strategy does not always fill the sack to full capacity but does in the fractional.

The underlying problem in the discrete case is that when picking an item for inclusion in the sack:

we must compare the solution to:

• the solution of a subproblem in which the item is included with

• the solution of a subproblem in which the item is excluded

before we can make a choice.

The greedy method would not exclude an item to compare with the case of including the item.

We cannot solve a locally optimal subproblem that leads to a globally optimal solution using the greedy algorithm in the discrete case.

In the discrete case, there are many overlapping subproblems as we saw in dynamic programmed solution in Chapter 15.

We must be able to compare the results from the case using the greedy strategy choice with that of another that did not use the greedy strategy choice.

We'll see how to do just that in best-first algorithms.

 int knapsack( capacity ) 1 max ← 0 2 for i ← 1 to N -- N number of items 3 if space ← capacity - size[ i ] ≥ 0 then -- Room for item i 4 if  tempmax ← knapsack( space ) + value[ i ] > max then -- Compare solutions 5 max ← tempmax -- keeping the best 6 return max