## Chapter 16 - Greedy Algorithms |
Document last modified: |

**Overview**

Greedy algorithmsare an approach to solving certain kinds of optimization problems.Greedy algorithms are similar to dynamic programming algorithms in that the solutions are both efficient and optimal if the problem exhibits some particular sort of substructure.

A greedy algorithm builds a solution by going one step at a time through the feasible solutions, applying a

heuristicto determine the best choice.A

heuristicapplies an insight to solving the problem, such as always choose the largest, smallest, etc.

**Elements of the greedy strategy**

Greedy algorithms should be applied to problems exhibiting these two properties:

greedy choice property:a globally optimal solution can be arrived at by making locally optimal (greedy) choice.When a

locally optimalchoice can lead to aglobally optimalsolution, the problem has this property.Applied to other problems that don't have this property, such as the Traveling (now tired) Salesman Problem, greedy algorithms may not find the optimal solution, but might find one that is "good enough" for some applications, especially when the alternative is an exponential-time brute force solution.

optimal substructure: As in dynamic programming. An optimal solution to a problem contains within it an optimal solution to subproblems.

Example:Recall D˙kstra's Algorithm:

finding the shortest path (z, x) was a subproblem of finding the shortest path (s, x)

Heuristic -a criteria used to determine the choice most likely to lead to an optimal solution.A greedy algorithm applies the heuristic to make the best choice at that moment.

- Recall the knapsack problem; the heuristic was to always use the maximum value for a given capacity.

- For Dykstra's Algorithm; the heuristic was to always choose the shorter of two paths.
In both cases, the local use of the heuristic led to a globally optimal solution.

- Shortest path used the RELAX heuristic: choose smallest cost to reach a vertex.

Optimal -the most advantageous:

locallyoptimal when better solution can occur elsewhere.

globallyoptimal when the solution is the best possible.

Solution space -the set of all possible solutions.Sorts have only one valid solution but n! possible solutions (permutations) in solution space to consider.

Making change for $1 has many valid solutions (100 pennies, 20 nickels) but only one solution is optimal for the fewest coins (2 half dollars).

Example:The solution space for all 3-bit binary numbers could be represented by a cube, at right, where each vertex corresponds to one of the numbers.

Searchingfor the maximum 3-bit value could be done by starting at an arbitrary vertex and, by some means (a heuristic), determining which vertex to examine next.A greedy algorithm heuristic might always choose the largest valued adjacent vertex in a maximization search.

Question 16.1

- Starting at vertex 100, apply a greedy search using the heuristic: "choose the largest adjacent vertex, stop if all are smaller" to find the maximum 3-bit value on the upper-right figure.

- Did a
locally optimalchoice lead to aglobally optimalsolution?

- Repeat the method starting at vertex 000.

- Did a
locally optimalchoice lead to aglobally optimalsolution?

- Does this problem have the
greedy-choice property,a globally optimal solution can be arrived at by making locally optimal (greedy) choice?

**Huffman codes for optimal data compression**

Why do Huffman codes matter?From

Algorithmsby Dasgupta, Papdimitriou and Vazirani.In the MP3 compression scheme, a sound signal is encoded in three steps.

- Analog signal is digitized by sampling at regular intervals yielding a sequence of real numbers s
_{1}, s_{2}, ..., s_{T}.The figure at right illustrates encoding an analog 0 to 1 signal as a 000

_{2}to 111_{2}value; where 0.5 to 0.625 is encoded as 100_{2}.The figure below illustrates the encoding of an analog signal (wavy curve) to a discrete value (bar).

At a sampling rate of 44,100 samples per second, a mono-channel 50-minute performance would correspond to T=50*60*44,100 ≈ 130,000,000 measurements.

- Each s
_{T}sample is quantized, approximated to a nearby number from a finite set Γ.This set is carefully chosen to exploit human perceptual limitations with the intention that the approximating sequence is indistinguishable by the human ear from the original s

_{1}, s_{2}, ..., s_{T}sequence.- The resulting string of length T over the alphabet Γ is encoded in binary.

ExampleCharacters are commonly represented by the same number of bits, e.g., the 7-bit ASCII code.

However, some characters occur more frequently in English (or in any languages with alphabets) than others.

Using a variable number of bits for a code such that frequent characters use fewer bits and infrequent character use more bits, we can decrease the space needed to store the same information.

For example, consider the following sentence:

"dead beef cafe deeded dad. dad faced a faded cab. dad acceded. dad be bad."Frequencies for the 77 characters are:

a b c d e f space . 12 4 5 19 12 4 17 4 Question 16.2 -Which characters of the sentence should we encode in the fewest bits for compression?Example encoding

If we use a fixed-length code like this: 000 (space) 001 a 010 b 011 c 100 d 101 e 110 f 111 .The sentence, which is of length 77, consumes 77 * 3 =231bits.But if we use a variable length code like this: 100 (space) 110 a 11110 b 1110 c 0 d 1010 e 11111 f 1011 .Then we can encode the text in:3*12+4*5+5*4+19*1+12*4+4*5+17*3+4*4 =230bits.A savings of 1 bit. Not much, but a start.

Prefix code -Note that such a code must be aprefixcode, where we can distinguish where one code stops and another starts; one code may not be a prefix of another code or cannot easily determine one code from another.

Example:Because of the encoding d=0, no other codes can have theprefix0, all others must start with a 1.If the characters have non-uniform frequency distributions, then finding such a code can lead to great savings in storage space.

A process with this effect is called

data compression.This can be applied to any data where the frequency distribution is known or can be computed, not just sentences in languages.

Examples:computer graphics, digitized sound, binary executables, etc.Optimal data compression yields savings from 25% to 90%.

There are many possible codes, but would like to find one that is optimal with respect to the number of bits needed to represent the data.

How can we find such a code?

Huffman's Algorithmis a greedy algorithm that constructs an optimal encoding.

Example - an optimal encoding of 100,000 characters

Left tree not optimal using fixed 3-bit length. f encoding is 101 or right, left, right. Right tree is optimal. f encoding is 1100 or right, right, left, left. Prefix code, no codeword is a 300,000 bits fixed-length 3-bit codeword 204,000 bits variable-length codeword |
Non-optimal Optimal |

Using fixed-length encoding: abc = 001010011

Using optimal encoding: abc = 0101100

Constructing a Huffman code

- Insight - lower frequency characters should have longer bit encoding (deeper depth);
start at bottom with lowest frequency characters,

recursively adding the lowest remaining characters.

- smaller code (highest frequency) is left branch 0,
larger is right branch 1

- greedy strategy (heuristic) - extract two values from a character frequency min-heap, guaranteed to be the minimum remaining.

- Diagram lists the minimum codes of any two characters on left branch but could be placed on right at same level with same cost.

Question 16.3From above Figure (b), why are codes for a=0 and f=1100?

Greedy strategy - why it works

greedy choice property:a globally optimal solution can be arrived at by making locally optimal (greedy) choice. A necessary condition.

- Greedy heuristic: extract two lowest frequency values from a character frequency min-heap, guaranteed to be the minimum remaining.

- Combine the two into a subtree, summing the frequencies.
The two lowest frequency character subtrees will start and stay at leaf of code tree.

- Insert subtree into min-heap.

- Each time the lowest two values are extracted from min-heap and combined, guaranteeing the lowest frequencies are at bottom of tree and the highest at the top.

- The highest frequency will have shortest code (length) while the lowest will have the longest.
Greedy choice property holds because the minimum subtree is selected from a min-heap and placed above lower frequency subtrees.

Example -Don't confuse the min-heap, which maintains the priority queue, with the tree constructed containing the Huffman codes.

min-heap Label each node of the min-heap with the frequency of the letter in the text to be compressed.
Initial min-heap used to identify minimum character and its frequency as: character:frequency |
min-heap Extract 2 minimum frequency subtrees from min-heap, 5 and 9. Sum frequencies at parent and insert subtree into min-heap. Min-heap now contains subtree with frequency 14 at parent and children f and e . |
||

Extract 2 minimum frequency entries from min-heap, 12 and 13. Sum frequencies at parent node and insert into min-heap. |
Extract 2 minimum frequency entries from min-heap, 14 and 16. Sum frequencies at parent node and insert into min-heap. |
||

Extract 2 minimum frequency entries from min-heap, 25 and 30. Sum frequencies at parent node and insert into min-heap. Frequency 45 and 55 remain. |
Extract 2 minimum frequency entries from min-heap, 45 and 55. Sum frequencies at parent node and insert into min-heap. The last iteration inserts the root of the encoding into the min-heap. The root node has value 100, the total number of characters. When finished, min-heap contains one entry, the character encoding tree. |

AlgorithmThe number of bits needed to encode the data is the the sum, for each character, of the number of bits in its code times its frequency.

Let

Tbe the tree

Cthe set of characterscthat comprise the alphabet

c.freqthe frequency of characterc.

d_{T}(c) the depth of charactercin the treeSince the number of bits is the same as the depth in the binary tree, we can express the sum in terms of:

B(T)=∑c.freqd_{T}(c)c∈CThe sum we want to minimize, called the

cost,B(T) of the tree.In the following algorithm,

freqis defined as above; it can be stored efficiently in an array indexed by characters.

freqis extended as needed to accommodate the values of internal tree nodes.

Cthe set of characters represented as leaf tree nodes.Priority queue

Qof tree nodes extracts the minimum element with a min-heap.Build the tree in a bottom up manner, starting with the individual characters and ending up with the root of the tree as the only element of the queue:

Huffman(C) -- pre: C is the characters that comprise the alphabet -- post: return the root of the optimal encoding treen ← |C| Q ← C -- insert all the elements of C into Q, using the frequency as the priorityfori ← 1ton-1doz ← a new tree node x ← Extract-Minimum (Q) y ← Extract-Minimum (Q) z.left ← x z.right ← y z.freq ← x.freq + y.freq Insert (Q, z)returnExtract-Minimum (Q) -- the root of the encoding tree

At first, the min-queue contains all the leaf nodes as a "forest" of singleton binary trees.Then the two nodes with least value are extracted from the queue, joined by a new tree node, and inserted into queue.

After

n-1 iterations, there is only one node left in the queue: the root of the tree.

Question 16.4.1Where's the greed?

Execute Huffman( C ) on the following:

cf . b c a e space d c.freq4 5 6 8 12 13 17 19 What's the code for: d and f.

ExampleLet's go through the earlier example using Huffman's algorithm. Below are the contents of

Qafter each step through theloop:for

Initially, all nodes are leaf nodes. Insert all 8 inQ:

cf . b c a e space d c.freq4 4 4 5 12 12 17 19 Join the two nodes with least value; now there are 7 things inQ:

cb c a e space d c.freq4 5 8

/ \

f .

4 412 12 17 19 Then the next two with least value,Qhas 6 elements:

ca e space d c.freq8

/ \

f .

4 49

/ \

b c

4 512 12 17 19 Now the two nodes with least values are the two trees just made, andQhas 5 elements:

ca e space d c.freq12 12 17

/ \

8 9

/ \ / \

f . b c

4 4 4 517 19 Qhas 4 elements:

cspace d c.freq17

/ \

8 9

/ \ / \

f . b c

4 4 4 517 19 24

/ \

a e

12 12 Three items left:

cd c.freq19 24

/ \

a e

12 1234

/ \

17 space

/ \ 17

8 9

/ \ / \

f . b c

4 4 4 5 Two big trees left:

cc.freq34

/ \

17 space

/ \ 17

8 9

/ \ / \

f . b c

4 4 4 543

/ \

d 24

19 / \

a e

12 12 Finally, join the whole thing:

cc.freq77

/ \

34 43

/ \ / \

17 space d 24

/ \ 17 19 / \

8 9 a e

/ \ / \ 12 12

f . b c

4 4 4 5At each point, the heuristic chooses the joining that would force less frequent characters to be deeper in the tree.

So an optimal prefix code is:01 (space) 110 a 0010 b 0011 c 10 d 111 e 0000 f 0001 .AndB(T) = 17*2+12*3+4*4+5*4+19*2+12*3+4*4 = 196 bits.A savings of 15% in the size of the data.Our initial variable length code: 100 (space) 110 a 11110 b 1110 c 0 d 1010 e 11111 f 1011 .can encode the same text in:3*12+4*5+5*4+19*1+12*4+4*5+17*3+4*4 = 230 bits.

**Divide-and-conquer**

Naturally divides problems into subproblems.

When one subproblem has a better solution than another, can construct greedy divide-and-conquer algorithm.

We see later, this is an important technique in multithreaded (parallel) algorithms.

Question 16.4.2Divide-and-conquer can determine the maximum of an array by:

- dividing the array into two halves

- recursively finding the maximum of each half

- choosing the maximum of the two halves

Give an algorithm to find the maximum of an array using divide-and-conquer.

Hint: the middle of the array with starting index

sand ending indexeis:mid = ⌊(

s+e) / 2⌋Is it really greedy? No.

Though the solution to the problem is in the subproblems, all subproblems are solved.

There is no choice about which subproblem to solve. Ideally greed would allow non-optimal subproblems to be identified and ignored.

**Discrete Knapsack - no fractional
items**

We saw that dynamic programming yielded an optimal solution in time O(nW).

Greed does not always yield an optimal solution, only when the

greedy choice propertyholds.

Consider the discrete knapsack problem using the greedy strategy:

- always choose the item with the maximum value/weight ratio that will fit in the sack.
There are exactly three items.

The sack can carry 50 pounds of weight.

We want as much value as the sack can carry.

The optimal solution Item 2 and 3=$100+$120=$220, see (b) above.

Item 1 would be chosen under the greedy strategy because it is the maximum value/weight ratio of 6.

Any solution with Item 1 is suboptimal.

item 1 2 3 weight 10 20 30 value $60 $100 $120 value

weight6 5 4

**Fractional Knapsack**

Value/weight ratio:

item 1 = 60/10 = 6

item 2 = 100/20 = 5

item 3 = 120/30 = 4

Discrete knapsack fails in Figure (b) using greedy strategy

"always choose the item with the maximum value/weight ratio that will fit in the sack"

choosing items 1 and 2 = $160 rather than items 2 and 3 = $220.

If we can take a fractional part of an item, as in Figure (c) where we take 20/30 of item 3 (20 pounds), the greedy strategy

"always choose the item with the maximum value/weight ratio that will fit in the sack"

yields an optimal solution.

Question 16.5 -Why does the strategy fail to yield an optimal in the discrete but does in the fractional?

Answer:Because in the discrete case, the greedy strategy does not always fill the sack to full capacity but does in the fractional.

The underlying problem in the discrete case is that when picking an item for inclusion in the sack:

we must compare the solution to:

- the solution of a subproblem in which the item is included with

- the solution of a subproblem in which the item is excluded
before we can make a choice.

The greedy method would not exclude an item to compare with the case of including the item.

We cannot solve a locally optimal subproblem that leads to a globally optimal solution using the greedy algorithm in the discrete case.

In the discrete case, there are many overlapping subproblems as we saw in dynamic programmed solution in Chapter 15.

We must be able to compare the results from the case using the greedy strategy choice with that of another that did not use the greedy strategy choice.

We'll see how to do just that in

best-firstalgorithms.

int knapsack( capacity )1 max ← 0 2 fori ← 1toN-- N number of items 3 ifspace ← capacity - size[ i ] ≥ 0then-- Room for item i 4 iftempmax ←knapsack( space ) + value[ i ] > maxthen-- Compare solutions 5 max ← tempmax -- keeping the best 6 returnmax