Chapter 11 Answers |
Specialized Universe of Keys
Question
The load factor a of a direct-addressed data structure is often given as: a = n/m
n = number of keys used, K = { 2, 3, 5, 8 }
m = number of keys in universe, U = { 0, 1, .., 9 }
What is the load (utilization) of the example. a = n/m = 4/10 = .4
What is the minimum load factor? a = n/m = 0/m = 0
What is the maximum load factor? Items may not occupy same slot: a = n/m = 10/10= 1.
Question
Diagram a data structure for direct-addressing to implement a dictionary of dates translating from historical dates to English representation, for example: 1492 ® fourteen ninety two
T
1490 1491 1492 1493 1494 1495 1496 NIL NIL 319 NIL NIL NIL NIL Address 319
english key fourteen ninety two 1492 Which is the key? 1492 satellite data? fourteen ninety two
What does T, direct address table, hold? Addresses to satellite data.
Give a call to insert 1492 ® fourteen ninety two.
Year x = new Year( 1492, "fourteen ninety two")
Insert( T, x )
Give a call to search for the translation of 1492 ® fourteen ninety two.
x = Search( T, 1492)
Give a call to delete 1492 ® fourteen ninety two.
Delete( T, x );
Question
Diagram T. See above. What is stored at each index of T? Address of a Year.
Are the keys totally ordered? Yes.
Is the key field of Year redundant? No. It is needed in Delete.
Performance
Question
Describe a procedure to find the maximum key used by a dynamic set S represented by a direct-address table T of length m.
MaximumKey( T )
-- pre: T is not empty
-- post: return maximum key in T
i=m
while i>1 and T[i] º NIL do i ¬ i-1
return i
What is the best and worst case performance? W(1) and O(m)
Describe a procedure to find the maximum element of dynamic set S represented by a direct-address table T of length m.
MaximumElement( T ) -- Assumes > property on elements
-- pre: T is not empty
-- post: return maximum element in T
max = MaximumKey( T )
for i=1 to m do
if T[i] ¹ NIL and T[i] > T[max] then max ¬ i
return T[ max ]
What is the best and worst case performance? O(m)
Question
Give one other situation when direct addressing as implemented does not work. When the key is not an integer. Suggest a solution. Define a mapping of key to 0..m where m is the number of keys.
What is U for the Java program? {0..9999}
What is K for the Java program? {1492, 2006}
What is the maximum load factor? a = n/m = 2/10000 = .0002
What are the minimum storage requirements for the direct-addressing? 10000+2
Hashing
Question
Does knowing the hash provide much useful information about the message? Not for industrial strength hashes, those with a large number of possible hash values. If the message vocabulary was restricted, say to a 2-digit number and the hash algorithm were known, the hash value could be reversed mapped back to the message.
Can one include the hash as part of the hashed message or must it be separate? It must be separate from the hashed message.
Collision resolution by chaining
Insertion: Always add to front of list, worst case time of O(1).
Question - Why O(1)? Hash is O(1), insertion is O(1).
Figure at below shows singly linked lists. If we want to delete elements, it’s better to use doubly linked lists.
Question - Why? Need previous and next pointers for deletion. In doubly linked lists each node has both. Deleting k2 requires changing next pointer of previous node, k5 to k7.
Implementation
Question - What are the keys in one of the buckets above? k2, k5, k7
Doubly linked
Question - Use h( k ) = k mod 2
Diagram a data structure for hashing to implement a dictionary of dates translating from historical dates to English words, for example:
1492 ® fourteen ninety two
previous
NILkey
1492data
fourteen ninety twonext
NILWhich is the key? satellite data? What does T hold? Address of the node.
Construct the hash table for keys: 2005, 1492, 1984, 1776, 2013
®
previous
NILkey
1492data next
®
¬
previouskey
1984data next
®
¬
previouskey
1776data next
NIL
®
previous
NILkey
2005data next
®
¬
previouskey
2013data next
NILInsert a different node with key 2005. Any problems? Although "y: Node_Ptr, y Î T's nodes: x ¹ y, there are two identical keys. Only the first will be found by the search.
®
previous
NILkey
1492data next
®
¬
previouskey
1984data next
®
¬
previouskey
1776data next
NIL
®
previous
NILkey
2005data next
®
previous
¬key
2005data next
®
¬
previouskey
2013data next
NILWhat about inserting the same node twice? Violates precondition: "y: Node_Ptr, y Î T's nodes: x ¹ y
Search for 1492, 2002 and 2005. Any problems? 2002 returns NIL, 2005 returns first key found.
Delete 1492, 2005 and 2002. Any problems? Deleting 2002 violates preconditions: $y: Node_Ptr, y Î T's nodes: x = y
®
previous
NILkey
1984data next
®
¬
previouskey
1776data next
NIL
®
previous
NILkey
2005data next
®
¬
previouskey
2013data next
NIL
Performance
Question - What is implied by: a = 1, a < 1, a > 1?
a is the average load. While Simple Uniform Hashing means that each element is equally likely to hash into any of the m slots, the load factor does not provide any information about the actual distribution; all actual elements may hash to the same slot.
If keys are uniformly distributed then a implies the average length of the chains.
Chained-Hash-Delete
Question - What is the problem with singly linked lists?
To locate the previous node of the one to delete, must start at the slot and iterate to node to delete, an O(n) operation versus Q(1) for doubly linked.
Chained-Hash-Search:
Question - Would using a binary search reduce search to Q(lg n)? If representing using an array. Linked lists are preferable, why? For Q(1) insert.
Unsuccessful - contains no element with key k
Theorem - An unsuccessful search takes expected time Q(1 + a)
Proof Simple uniform hashing ® any key not already in the table is equally likely to hash to any of the m slots.
To search unsuccessfully for any key k, need to search to the end of the list T[ h( k ) ].
This list has expected length E [nh(k)] = a. Therefore, the expected number of elements examined in an unsuccessful search is a.
Adding in the time to compute the hash function, the total time required is Q(1 + a).
Question - What is Q(1 + a) when:
n = 10, m = 100 Q(1+1/10) = Q(1)
n = m3 Q(m3/m) = Q(m2)
Successful - T contains element with key k
A total of 7 key comparisons would be done to locate each of the 6 elements once.
Question - Explain. For the table above, slots with 1 element require 1 comparison; slot 4 will require 1 comparison for 1492 and 2 comparisons for 1812.
Hash functions
Question Give an example where this might not satisfy simple uniform hashing. Student ID or other numbers that have persistent patterns in certain digits.
Question
What are some of the criteria for key selection?
To distinguish one element from another, keys must be unique.
Can be quickly hashed somewhat uniformly to a natural number.
Keys should be independent.
Would a key based on the year of birth for first-graders be useful? No, nearly all would be the same.
Would a hash function with m = 26 and a hash function that extracted the first vowel of a name satisfy the Simple Uniform Hashing assumption?
For example: h( k ) = (first vowel of k) mod 26
No, there are only 5 vowels (aeiou) so that all names would hash to 5 slots. This is not a hash function but direct addressing.
A hash that extracted the month of birth from the key?
For example: h( k ) = month of birth in k
Yes, even though births are not precisely uniformly distributed.