Chapter 11 Answers |
Document last modified: |
Specialized Universe of Keys
Question 11.0
The load factor a of a direct-addressed data structure is often given as: a = n/m
n = number of keys used, K = { 2, 3, 5, 8 }
m = number of keys in universe, U = { 0, 1, .., 9 }
What is the load (utilization) of the example. a = n/m = 4/10 = .4
What is the minimum load factor? a = n/m = 0/m = 0
What is the maximum load factor? Items may not occupy same slot: a = n/m = 10/10= 1.
Question 11.1
Diagram a data structure for direct-addressing to implement a dictionary of dates translating from historical dates to English representation, for example: 1492 ® fourteen ninety two
T
1490 1491 1492 1493 1494 1495 1496 NIL NIL 319 NIL NIL NIL NIL Address 319
english key fourteen ninety two 1492 Which is the key? 1492 satellite data? fourteen ninety two
What does T, direct address table, hold? Addresses to satellite data.
Give a call to insert 1492 ® fourteen ninety two.
Year x = new Year( 1492, "fourteen ninety two")
Insert( T, x )
Give a call to search for the translation of 1492 ® fourteen ninety two.
x = Search( T, 1492)
Give a call to delete 1492 ® fourteen ninety two.
Delete( T, x );
Question 11.2
Diagram T. See above. What is stored at each index of T? Address of a Year.
Are the keys totally ordered? Yes.
Is the key field of Year redundant? No. It is needed in Delete.
Performance
Question 11.3
Describe a procedure to find the maximum key used by a dynamic set S represented by a direct-address table T of length m.
MaximumKey( T )
-- pre: T is not empty
-- post: return maximum key in T
i=m
while i>1 and T[i] º NIL do i ¬ i-1
return i
What is the best and worst case performance? W(1) and O(m)
Describe a procedure to find the maximum element of dynamic set S represented by a direct-address table T of length m.
MaximumElement( T ) -- Assumes > property on elements
-- pre: T is not empty
-- post: return maximum element in T
max = MaximumKey( T )
for i=1 to m do
if T[i] ¹ NIL and T[i] > T[max] then max ¬ i
return T[ max ]
What is the best and worst case performance? O(m)
Question 11.4
Give one other situation when direct addressing as implemented does not work. When the key is not an integer. Suggest a solution. Define a mapping of key to 0..m where m is the number of keys.
What is U for the Java program? {0..9999}
What is K for the Java program? {1492, 2006}
What is the maximum load factor? a = n/m = 2/10000 = .0002
What are the minimum storage requirements for the direct-addressing? 10000+2
Hashing
Question 11.5
Does knowing the hash provide much useful information about the message? Not for industrial strength hashes, those with a large number of possible hash values. If the message vocabulary was restricted, say to a 2-digit number and the hash algorithm were known, the hash value could be reversed mapped back to the message.
Can one include the hash as part of the hashed message or must it be separate? It must be separate from the hashed message.
Collision resolution by chaining
Insertion: Always add to front of list, worst case time of O(1).
Question 11.6 - Why O(1)? Hash is O(1), insertion is O(1).
Figure at below shows singly linked lists. If we want to delete elements, it’s better to use doubly linked lists.
Question 11.7 - Why? Need previous and next pointers for deletion. In doubly linked lists each node has both. Deleting k2 requires changing next pointer of previous node, k5 to k7.
Implementation
Question 11.8 - What are the keys in one of the buckets above? k2, k5, k7
Doubly linked
Question 11.9 - Use h( k ) = k mod 2
Diagram a data structure for hashing to implement a dictionary of dates translating from historical dates to English words, for example:
1492 ® fourteen ninety two
previous
NILkey
1492data
fourteen ninety twonext
NILWhich is the key? satellite data? What does T hold? Address of the node.
Construct the hash table for keys: 2005, 1492, 1984, 1776, 2013
®
previous
NILkey
1492data next
®
¬
previouskey
1984data next
®
¬
previouskey
1776data next
NIL
®
previous
NILkey
2005data next
®
¬
previouskey
2013data next
NILInsert a different node with key 2005. Any problems? Although "y: Node_Ptr, y Î T's nodes: x ¹ y, there are two identical keys. Only the first will be found by the search.
®
previous
NILkey
1492data next
®
¬
previouskey
1984data next
®
¬
previouskey
1776data next
NIL
®
previous
NILkey
2005data next
®
previous
¬key
2005data next
®
¬
previouskey
2013data next
NILWhat about inserting the same node twice? Violates precondition: "y: Node_Ptr, y Î T's nodes: x ¹ y
Search for 1492, 2002 and 2005. Any problems? 2002 returns NIL, 2005 returns first key found.
Delete 1492, 2005 and 2002. Any problems? Deleting 2002 violates preconditions: $y: Node_Ptr, y Î T's nodes: x = y
®
previous
NILkey
1984data next
®
¬
previouskey
1776data next
NIL
®
previous
NILkey
2005data next
®
¬
previouskey
2013data next
NIL
Performance
Question 11.10 - What is implied by: a = 1, a < 1, a > 1?
a is the average load. While Simple Uniform Hashing means that each element is equally likely to hash into any of the m slots, the load factor does not provide any information about the actual distribution; all actual elements may hash to the same slot.
If keys are uniformly distributed then a implies the average length of the chains.
Chained-Hash-Delete
Question 11.11 - What is the problem with singly linked lists?
To locate the previous node of the one to delete, must start at the slot and iterate to node to delete, an O(n) operation versus Q(1) for doubly linked.
Chained-Hash-Search:
Question 11.12 - Would using a binary search reduce search to Q(lg n)? If representing using an array. Linked lists are preferable, why? For Q(1) insert.
Unsuccessful - contains no element with key k
Theorem - An unsuccessful search takes expected time Q(1 + a)
Proof Simple uniform hashing ® any key not already in the table is equally likely to hash to any of the m slots.
To search unsuccessfully for any key k, need to search to the end of the list T[ h( k ) ].
This list has expected length E [nh(k)] = a. Therefore, the expected number of elements examined in an unsuccessful search is a.
Adding in the time to compute the hash function, the total time required is Q(1 + a).
Question 11.13 - What is Q(1 + a) when:
n = 10, m = 100 Q(1+1/10) = Q(1)
n = m3 Q(m3/m) = Q(m2)
Successful - T contains element with key k
A total of 7 key comparisons would be done to locate each of the 6 elements once.
Question 11.14 - Explain. For the table above, slots with 1 element require 1 comparison; slot 4 will require 1 comparison for 1492 and 2 comparisons for 1812.
Hash functions
Question 11.15 Give an example where this might not satisfy simple uniform hashing. Student ID or other numbers that have persistent patterns in certain digits.
Question 11.16
What are some of the criteria for key selection?
To distinguish one element from another, keys must be unique.
Can be quickly hashed somewhat uniformly to a natural number.
Keys should be independent.
Would a key based on the year of birth for first-graders be useful? No, nearly all would be the same.
Would a hash function with m = 26 and a hash function that extracted the first vowel of a name satisfy the Simple Uniform Hashing assumption?
For example: h( k ) = (first vowel of k) mod 26
No, there are only 5 vowels (aeiou) so that all names would hash to 5 slots. This is not a hash function but direct addressing.
A hash that extracted the month of birth from the key?
For example: h( k ) = month of birth in k
Yes, even though births are not precisely uniformly distributed.
Question 11.17 - Are powers of 10 for integer keys bad? Give an example.
Powers of 10 are bad. If m = 10p for integer p, then h(k) is just the least significant p digits of k.100 = 102
CLRS as (67 · 1003)+ (76 · 1002)+ (82 · 1001)+ (83 · 1000) mod 102 = 83
ABCS as (65 · 1003)+ (66 · 1002)+ (67 · 1001)+ (83 · 1000) mod 102 = 83
Question 11.18 - Show that m=9 is a bad choice for integer keys (i.e. radix 101). Use 123 and 321.
CLRS as (67 · 103)+ (76 · 102)+ (82 · 101)+ (83 · 100) mod 9 = 2
SRLC as (83 · 103)+ (82 · 102)+ (76 · 101)+ (67 · 100) mod 9 = 2
123 mod 9 = 6
321 mod 9 = 6
Question 11.19 - Find h( k ) for:
m = 16 (implies m = 24, p = 4)
w = 8
k = 21
s = 32 Must have 0 < s < 28; choose s = 32 ® A = s / 28
- A = 1/8
- k·A = 21/8
- k·A mod 1 = 5/8
- m · (k·A mod 1) = 16*5/8 = 10
- h( k ) = 10 h(k) = ëm · (k·A mod 1)û
Question 11.20 - Find h( k ) for:
m = 4 (implies m = 22, p = 2)
w = 3
k = 12
s = 5 0 < s < 2w = 23 = 8
k · s = 12 · 5 = 60 = 7 · 23 + 4 = r1 . r0 r1
r0= 7 · 23
= 4 = 1002
- Written in w = 3 bits, r0 = 1002
- The p = 2 most significant bits of r0 is 102 or 210, so h(12) = 2
Question 11.21 - What is the final bit operation needed to recover 14 bit h(123456) from 32 bit r0?
Right logical shift by 32-14 = 18 bits.
- r0 = 000000010000110011000000010000002
- p = 14, so take the 14 most significant bits of r0
000000010000110011000000010000002 with 18 >> is 000000000000000000000000010000112