Disjoint Sets

Overview

Disjoint sets - have no elements in common.

For example, the sets of alive and dead people are disjoint.

Suppose n items (student records, bank account records, whatever) each with unique keys from 1..n.

Keep the items in a collection of sets (disjoint sets) such that an item must occur in exactly one of those sets.

For example, to partition a set of students into "students with GPA >= 2.0" and "students with GPA < 2.0."

Or the collection may have many sets, e.g. different income levels of people.

The sets are mutually exclusive and include all the items.

Example - The disjoint sets of people that are alive and dead.

Alive = { Tom, Alice, Harry }

Dead = { Fred, John }

Representative

In disjoint sets, each set is identified by a representative that is some member of the set.

For convenience, we can choose this element to be the one with the smallest key, other characteristics could be used but we need to be able to always choose the same representative for the same set.

Convenient to think of the representative as the parent of the other items in the set, as in a tree.

Example - Alice could be the representative of the Alive set.

Only requirement is that as long as set does not change, Alice is always the representative.

In many situations in computer science, problems involving disjoint sets naturally arise such that the sets grow dynamically (i.e., during the course of an algorithm, sets change by merging (union).

Two important disjoint set operations are:

• FIND-SET (x) - determine which set an item with key x is in, i.e., return the key of the representative of the set x is in.

Can determine if two elements are in the same set: a FIND-SET on both elements and compare the return values.

If the same representative is returned, then the two items are in the same set.

Example - Assuming Alice is the representative of set Alive.

FIND-SET( Tom ) returns Alice

FIND-SET( Harry ) returns Alice

• UNION (x, y) - unite the sets containing elements x and y.

Example - Unite Dead and Alive sets.

UNION( Tom, John )

Example - CONNECTED-COMPONENTS algorithm using FIND-SET and UNION.

Graph Representation

G(V, E) - G is the pair (V, E)

V - finite set called the vertex set whose elements are called vertices

E - is a binary relation on V called the edge set, whose elements are called edges

Undirected Graph

The pairs in E are unordered pairs, i.e., there is no tail or head and there is no direction to the edge.

(u, v) ∈ E is the same edge as (v, u) ∈ E

 Example Vertex set = {1, 2, 3, 4, 5}Edge set = {(1,2)(2,1)(1,5)(5,1)(2,5)(5,2)(4,2)(2,4)(4,5)(5,4)(4,3)(3,4)(2,3)(3,2)}

CONNECTED-COMPONENTS - compute the connected components of an undirected graph G=(V, E).

The connected components of a graph are the subgraphs that are all mutually connected.

Connected components:

{a, b, c, d}

{e, g, f}

{h, i}

{j}

Basic idea:

• Initially all vertices are in singleton sets.

{ {a}, {b}, {c}, {d}, {e}, {f}, {g}, {h}, {i}, {j} }

• MAKE-SET( v ) makes a singleton set containing v - the representative is, of course, v.

• Call FIND-SET to determine if two connected vertices of an edge (x, y) are in same set (i.e. have same representative).

• If x and y are not in same set, call UNION operation to unite sets containing x and y.

• When FIND-SET has been called on all edges in E, UNION has placed all connected vertices in the same set.
SAME-COMPONENT - Determine whether two vertices are in the same connected component.
• Call FIND-SET on both vertices and see if their representative is the same.

• If so, then they are connected.

Question 21.0 - Note that for undirected graph edge (u, v) there is a (v, u). We need only consider one of the edges.

1. What is G=(V, E)?

2. What is G.V?

1. Is b ∈ G.V?

2. Is z ∈ G.V?

3. What is G.E?

1. Is (b, d) ∈ G.E?

2. Is (b, e) ∈ G.E?

4. What is MAKE-SET( a ) result?

5. Step through the execution of CONNECTED-COMPONENTS for graph G below.

1. What is FIND-SET( b ) FIND-SET( d ) for sets {b} and {d}?

2. What is UNION( b, d ) for sets {b} and {d}?

3. What is FIND-SET( b ) FIND-SET( d ) for set {b, d}?

6. After running CONNECTED-COMPONENTS(G), what is the result of:

SAME-COMPONENT( a, b)?

SAME-COMPONENT( a, e)?

 CONNECTED-COMPONENTS( G )    for each vertex v ∈ G.V         do MAKE-SET( v )    for each edge (u, v)  ∈ G.E         do if FIND-SET( u ) ≠ FIND-SET( v )                  UNION( u, v) SAME-COMPONENT(u, v)    if FIND-SET( u ) == FIND-SET( v )        return TRUE    else return FALSE

Array representation of disjoint sets

The text represents disjoint sets by linked lists but does not give any algorithms.

Here are simple algorithms to implement disjoint set operations represented by arrays.

The key idea is an array p[1..n] with an element for each of n items that is in some set.

The example above would have an array of 10 elements, one for each vertex in the graph, [a, b, ..., j].

Representative - If the item is a representative (parent) of some set, then the value of this element is its own array index.

5 is the representative.

Linked list - Otherwise the value is the index of another item in the array, giving rise to a linked list eventually ending in the parent (representative) item.

Representing the set in an array:

 Index i 5 7 10 12 Members p[ i ] 5 12 5 10

Example: For the following disjoint sets (indicated by their unique 1..n keys), we might have the following lists giving the set relationships:

 { 1 6 14 } { 2 13 3 } { 5 10 12 7 } { 4 11 15 9 }

To find the representative of a set, we just traverse the list until we reach the parent (which points to itself).

The representative used here is the minimum but could be any element as long as consistently returned.

Representing the set in an array:

 Index i 5 7 10 12 Members p[ i ] 5 12 5 10
An array representing the above forest would be:
 Index i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Members p[ i ] 1 2 13 4 5 1 12 0 15 5 4 10 2 6 11

Implementation assumes that p[1..n] is initialized to p[i] = i so each item is in its own singleton set and is its own set's representative.

FIND-SET - Assumes the representative points to itself.

 -- return the key of the representative of set xFIND-SET (x)    if p[ x ] ≠ x then return FIND-SET ( p[ x ] )   return x
 Index i 5 7 10 12 Members p[ i ] 5 12 5 10
`Question 21.1                                           `
• `What value is p[7]?`
• `Trace FIND-SET( 7 ) showing values of x.`
• `When does the worst case occur?`
• `What is the precondition?`
• `Give a non-recursive FIND-SET with equivalent worst case performance.`
 -- return the key of the representative of set xFIND-SET (x)    if p[ x ] = x then return x   return FIND-SET ( p[ x ] )
Convert tail-recursion to a while loop:

• change the if to a while

• change the test in the while to the not of the test used in the if (use De Morgan's Laws)

• change the recursive call into an assignment statement which modifies the value stored in the variable that is used to control the while loop

` `
`UNION - Assumes the representative points to itself. The resulting representative is not necessarily the minimum/maximum of the union.`
 -- join two sets containing items x and y together UNION (x, y)   a ← FIND-SET (x)       -- a is x's representative   b ← FIND-SET (y)       -- b is y's representative   p[ a ] ← b                   -- now b is a's parent, and FIND-SET (a) would return b
`Example    UNION( 7,  3)`
 a = 5 b = 2

a ← FIND-SET (7) = 5

b ← FIND-SET (3) = 2

p[ 5 ] ← 2                  -- now b is a's representative, and FIND (7) would return 2

 -- join two sets containing items x and y together UNION (x, y)   a ← FIND-SET (x)       -- a is x's representative   b ← FIND-SET (y)       -- b is y's representative   p[ a ] ← b                   -- now b is a's parent, and FIND-SET (a) would return b
`Question 21.2`
 i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 p[ i ] 1 2 13 4 5 1 12 0 15 5 4 10 2 6 11
• `Trace the UNION( 3, 14 )`
• `Draw the new set as a linked list. `
• `Does this structure support the SAME-COMPONENT(6, 3) procedure?`
 SAME-COMPONENT(u, v)   if FIND-SET( u ) == FIND-SET( v )       return TRUE   else return FALSE
• ```What is the estimate of the worst case for UNION of 2 sets with a total of n elements? Why?

```
Worst-case analysis
• In the worst case, FIND-SET will take O(n) operations; this worst case would occur when we have one big set represented as a long linked list and try to FIND-SET the representative of the last item.

• UNION is bound by the same worst case time, since it calls FIND-SET twice or O(1) times.

Improvements

FIND-SET path compression

One obvious improvement would be to have all set members point directly to their representatives, instead of to other arbitrary members of the set.

Using path compression each time FIND-SET executed on a set element, make all set members directly point to the representative element.

By compressing the path from leaf to root, after a FIND-SET, no intermediate links between a leaf and the root, there is just one.

Every item along the path from leaf to root is directly linked to the root:

NEW
 FIND-SET ( x )   if p[ x ] ≠ x then p[ x ] ← FIND-SET( p[ x ] )   return p[x]
OLD
 FIND-SET (x)   if p[ x ] ≠ x then return FIND-SET ( p[ x ] )   return p[x]

The next time a FIND-SET is done on any element along this path, the parent will be returned in O(1) time.

Yields very good performance in practice, since any long chain will likely be broken quickly.

Example

Recall the old FIND-SET result of an earlier example returns 2:

FIND-SET( 3 )

 ⇒

The new FIND-SET has the effect of placing all pointers to the representative element of the set.

FIND-SET( 3 )

 Index i 2 3 13 Members p[ i ] 2 13 2

 Index i 2 3 13 Members p[ i ] 2 2 2

FIND-SET compresses the path of all set elements to point to the representative 2.

Example

Recall the result of an earlier example:

UNION( 7,  3)

a = 5

 Index i 5 7 10 12 Members p[ i ] 5 12 5 10

b = 2

 Index i 2 3 13 Members p[ i ] 2 13 2

 Index i 2 3 5 7 10 12 13 Members p[ i ] 2 13 2 12 5 10 2

The new FIND-SET has the effect of placing all pointers to the representative element of the set. Just suppose that we start with the above sets and execute:

 UNION (7, 3)   a ← FIND-SET (7) = 5      -- a is x's representative   b ← FIND-SET (3) = 2      -- b is y's representative    p[ 5 ] ← 2                        -- now b is a's parent, and FIND-SET (a) would return b
 b = 2                           a = 5

FIND-SET compresses the path of all set elements to point to the representative.

UNION still unites the sets via the representative.

Note that 5 no longer points to itself so is not the representative of {2, 13, 3, 5, 10, 12, 7} even though other elements point to 5.

 p[ 5 ] ← 2                  -- now b is a's representative, and FIND (7) would return 2

After UNION(7, 3)

FIND-SET( 10 ), would now point all links to the representative.

 Index i 2 3 5 7 10 12 13 Members p[ i ] 2 2 2 2 2 2 2

Question 21.3 - For the set at right:

• What is the cost in executions of the old FIND-SET(7) the first time it executes?

• What is the cost in executions of the new FIND-SET(7) the first time it executes?

• What is the cost in executions of the old FIND-SET(7) the second time it executes ?

• What is the cost in executions of the new FIND-SET(7) the second time it executes?

• What is the worst-case in executions for the new FIND-SET after the first execution?

NEW
 FIND-SET ( x )   if p[ x ] ≠ x then p[ x ] ← FIND-SET( p[ x ] )   return p[x]
OLD
 FIND-SET (x)   if p[ x ] ≠ x then return FIND-SET ( p[ x ] )   return p[x]

UNION by size, weighted-union heuristic

Each UNION pushes some tree down a level.

Means that the next FIND-SET done on a member of this subtree will take a little longer than it would have before the UNION.

Although the path is compressed during the FIND-SET, the damage is already done with the time spent doing the FIND-SET.

Example - Consider the cost of FIND-SET( 7 ) on the following uncompressed path:

The cost would be less if the larger set was the parent since 2 accesses required versus 1.

Can minimize the impact of this situation using a heuristic called union-by-size or weighted-union heuristic.

• Keep track of how many elements are in a subtree, and make the smaller subtree the child of the larger subtree.

• This ensures that the majority of the items in the new tree are unaffected by the UNION in terms of how long it takes to find the representative.

• With each item x we associate a count c[x] that contains the number of items in the tree rooted at x.

• When the sets first start out as singletons, they each have a count of one.

Example: 6 singleton sets, each with a count of 1.

 i 1 2 3 4 5 6 p[i] 1 2 3 4 5 6 c[i] 1 1 1 1 1 1

Example: 2 sets, {1,3,5,6} with representative 1 and {2,4} with representative 2.

 i 1 2 3 4 5 6 p[i] 1 2 1 2 1 1 c[i] 4 2 1 1 1 1

 UNION (x, y)a ← FIND-SET (x)b ← FIND-SET (y) if c[a] > c[b]    p[b] ← a                      -- a has more kids; make b its child   c[a] ← c[a] + c[b]        -- and update the count of a to include b's kids else   p[a] ← b                      -- or vice-versa when b has more kids   c[b] ← c[b] + c[a]

The text uses a different, approximate method in order to make the analysis of the algorithm easier.

Question 21.4

What is the p[] and c[] after UNION(6, 4)?

 i 1 2 3 4 5 6 p[i] 1 2 1 2 1 1 c[i] 4 2 1 1 1 1

```Examples
```

Some UNION operations on sets with indices 1..6:
 a) 6 singleton sets     i 1  2  3  4  5  6 p[i]  1  2  3  4  5  6 c[i]  1  1  1  1  1  1 b) Union (3, 5)     i 1  2  3  4  5  6 p[i]  1  2  5  4  5  6 c[i]  1  1  1  1  2  1 c) Union (4, 2)     i 1  2  3  4  5  6 p[i]  1  2  5  2  5  6 c[i]  1  2  1  1  2  1 d) Union (2, 6)     i 1  2  3  4  5  6 p[i]  1  2  5  2  5  2 c[i]  1  3  1  1  2  1 e) Union (1, 4)     i 1  2  3  4  5  6 p[i]  2  2  5  2  5  2 c[i]  1  4  1  1  2  1 f) Union (3, 6)     i 1  2  3  4  5  6 p[i]  2  2  5  2  2  2 c[i]  1  6  1  1  2  1

Question 21.5

• In b), which element is the representative of {3, 5}?
• In d), what is the set(s)?
• In e), which element is the representative of {1, 2, 4, 6}?
• In f), what is the set and which element is the representative?
• After f), trace the execution of FIND-SET( 3 ). What is p?
 FIND-SET ( x )   if p[ x ] ≠ x then p[ x ] ← FIND-SET( p[ x ] )   return p[x]

Analysis - Amortized Average O(lg* n)

The analysis of the last UNION/FIND-SET algorithms (compression) is beyond the scope of an undergraduate class.

However, the results are pretty amazing: a UNION or FIND-SET operation takes O(lg* n) time amortized over all the operations (i.e., one particular instance may take longer, but overall, each one averages out to O(lg* n).

lg* is the iterated logarithm function; it's the number of times you can take the log base 2 of a number until the result is 1 or less (see text page 58).

This is a very slowly growing number as seen in the table below, where lg* 265536 = 5.

You will probably never need to do UNION/FIND-SET on sets with that many elements; indeed, there aren't even that many bytes in all the computers in the world.

So for all practical purposes, UNION/FIND-SET can be considered to run in constant time.

 lg* 2 = 1          = lg* 21                         lg 2 = 1lg* 4 = 2          = lg* 22                         lg 22 , lg 2 = 1 lg* 16 = 3        = lg*                       lg 222 , lg 22, lg 2=1 lg* 216 = 4       = lg* lg* 65536 = 4   = lg* lg* 265536 = 5  = lg* 265536 = 2.0035299304068464649790723515603e+19728