Huffman |
Document last modified: |
| c | m | n | o | p | q | r | s | t |
| f(c) | 3 | 5 | 9 | 4 | 2 | 7 | 6 | 8 |
Huffman (C) -- pre: C is the characters that comprise the alphabet -- post: return the root of the optimal encoding tree n ← |C| Q ← C -- insert all the elements of C into Q, using the frequency as the priority for i ← 1 to n-1 do z ← a new tree node x ← Extract-Minimum (Q) y ← Extract-Minimum (Q) left[ z ] ← x right[ z ] ← y f[z] ← f[x] + f[y] Insert (Q, z) return Extract-Minimum (Q) -- the root of the encoding tree |
Below are the contents of the min-heap Q after each step through the for loop:
Initially, all nodes are leaf nodes. Insert all 8 in Q:
| c | q | m | p | n | s | r | t | o |
| f(c) | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Join the two nodes with least value; now there are 7 things in Q:
| c | p | n | s | r | t | o | |
| f(c) | 4 | 5 / \ q m 2 3 |
5 | 6 | 7 | 8 | 9 |
Then the next two with least value, Q has 6 elements:
| c | n | s | r | t | o | |
| f(c) | 5 | 6 | 7 | 8 | 9 / \ p 5 4 / \ q m 2 3 |
9 |
Then the next two with least value, Q has 5 elements:
| c | r | t | o | ||
| f(c) | 7 | 8 | 9 / \ p 5 4 / \ q m 2 3 |
9 | 11 / \ n s 5 6 |
Q has 4 elements:
| c | o | |||
| f(c) | 9 / \ p 5 4 / \ q m 2 3 |
9 | 11 / \ n s 5 6 |
15 / \ r t 7 8
|
Three items left:
| c | |||
| f(c) | 11 / \ n s 5 6 |
15 / \ r t 7 8
|
18 / \ o 9 9 / \ p 5 4 / \ q m 2 3 |
Two big trees left:
| c | ||
| f(c) | 18 / \ o 9 9 / \ p 5 4 / \ q m 2 3 |
26 / \ 11 15 / \ / \ n s r t 5 6 7 8 |
Finally, we join the whole thing up:
| c | |
| f(c) | __44 ___ / \ 18 26 / \ / \ o 9 11 15 9 / \ / \ / \ p 5 n s r t 4 / \ 5 6 7 8 q m 2 3 |
At each point, the heuristic chooses the joining that would force less frequent characters to be deeper in the tree.
So an optimal prefix code is:00 o 010 p 100 n 101 s 110 r 111 t 0110 q 0111 mAnd B(T) = 9*2+3*4+3*5+3*6+3*7+3*8+2*4+3*4 == 18+12+15+18+21+24+8+12 = 128 bits.A savings of 64% in the size of the data over 352 bits as 8-bit encoding.