22.5 Strongly Connected Component


Document last modified: 

Overview

Graphs are of special interest in networking applications. Strongly connected components define network nodes connected by multiple, therefore redundant paths.

The Web, essentially one large directed graph.

To gather pages for search engines, Web crawlers move from one vertex (page) to another along edges defined by hyperlinks.

Strongly Connected Components (SCC) define a special relationship among a set of vertices that can be exploited.

For example, to improve ranking relevancy, some search engines define sites that are referenced by others as authorities on a particular topic, Kleinberg, J. (1999) Authoritative sources in a hyperlinked environment. Journal of the Association for Computing Machinery, 46(5), 604–632.

The figure at right, Web sites A-F have pages with hyperlinks as edges. ABC is not while A, BC, EF and D are SCC's.

 

G is strongly connected, H is not.

Definition

A SCC of G = (V, E) is a maximal set of vertices C, where C Í V such that "u, v Î C, u v, and v u.

G

GT

Basic method for finding the SCC's of G:

  1. Perform DFS on G
  2. Perform DFS on GT, the transpose of G formed by reversing paths.
  3. The DFS subtrees common to both G and GT comprise a SCC.

Recall that roots of DFS trees occur where v.p = NIL for v Î V.

Example: Figures at right

    (a) G.    DFS G trees:  

b        c
|       /   \
e     d     g
|           /  \
a         f     h

b.d < e.d < a.d < a.f < e.f < b.f

  (b    (e       (a      a)     e)     b)

Question 22.19 - Give the parenthesization for DFS(G) tree rooted at c.

(b) GT the transpose of G formed by reversing paths.    DFS( GT ) trees:

b      c     g     h
|       |     |       
a      d     f
|            

     (c) GSCC are those DFS subtrees common to G and GT: abe, cd, gf, h

Connections between SCC's:

  • abe ® cd because G contains b ® c
  • cd ®  fg because G contains c ® g
  • cd ®  h because G contains d ® h
  • etc.
u.f listed in decreasing order: 
     b, e, a, c, d, g, h, f
G a b c d e f g h
d 13 11 1 8 12 3 2 5
f 14 16 10 9 15 4 7 6
p e NIL NIL c b g c g
u.f listed in increasing order:
     e, a, b, d, c, f, g, h
GT a b c d e f g h
d 2 1 7 8 3 12 11 15
f 5 6 10 9 4 13 14 16
p b NIL NIL c a g NIL NIL
Strongly-Connected-Components (G)
1    Call DFS (G) to compute finishing times u.f for each vertex u
2 Compute GT
3 Call DFS (GT), but in the main loop of DFS, consider the vertices in order of decreasing u.f (as computed in line 1)
4 Output the vertices of each tree in the depth-first forest formed in line 3 as a separate strongly connected component. 
Do this by starting with the smallest u.f value (from GT) and use the predecessor array p, from GT
Components that are connected in G are still connected; e.g. (abe, cd) since G contains (b, c).

Example - After running DFS (GT), Line 4 of the algorithm produces the SCC's:

u.f 4, 5, 6

u.p e, a, b

u.f 9, 10

u.p d, c

u.f 13, 14

u.p f, g

u.f 16

u.p h

Key idea

From the diagrams:

e, b Î C, a SCC.

e b and b e in G

e b and b e in GT

meaning there are paths between vertices b and e in both G and GT.

DFS(G) forms trees in one direction.

DFS(GT) forms trees in the opposite direction starting at the root of DFS(G) trees (topological sort).

DFS(G) forms DFS trees we'll call SCC C.  The maximum u.f is a DFS tree root.

GT has reversed edges, a DFS(G) root now starts a DFS(GT).

DFS(GT) starts with SCC C such that x Î C, x.f is maximum (i.e. a DFS(G) root). Start at b.

DFS(GT) starts from x Î C, e.g. b, and visits all vertices in SCC C, constructing a DFS tree of those vertices.

DFS(GT) only considers DFS(G) trees starting from x Î C, and visits all vertices in SCC C, constructing a DFS tree of those vertices.

DFS(G) trees

b        c
|       /   \
e     d     g
|           /  \
a         f     h

DFS(GT) trees

b      c     g     h
|       |     |       
a      d     f
|            

DFS(G) u.f listed in decreasing order: 
     b, e, a, c, d, g, h, f
G a b c d e f g h
d 13 11 1 8 12 3 2 5
f 14 16 10 9 15 4 7 6
p e NIL NIL c b g c g
DFS(GT) u.f listed in increasing order:
     e, a, b, d, c, f, g, h
GT a b c d e f g h
d 2 1 7 8 3 12 11 15
f 5 6 10 9 4 13 14 16
p b NIL NIL c a g NIL NIL

Corollary 22.15

Let C and C' be distinct strongly connected components in directed graph G=(V, E).

Suppose that there is an edge (u, v) Î ET where u Î C and v Î C'.

Then f(C') > f(C).

Corollary says that since f(C') > f(C) for all C' ≠ C, there are no edges from C to C' in GT. Therefore, DFS will visit only vertices in C.

Which means that the depth-first tree rooted at x contains exactly the vertices of C.

Example: There is no path from component abe ® cd in GT at right because f(cd) > f(abe).

Question 22.20 Is f(gf) > f(cd) from the DFS GT at right? What does that mean? 

The next root chosen in DFS GT is in SCC C' such that f(C) is maximum over all SCC’s other than C'.

DFS visits all vertices in C', but the only edges out of C' go to C, which we’ve already visited.

Therefore, the only tree edges will be to vertices in C'. We can continue the process.

Example: There is an edge in ET, cd ® abe, but have already visited abe (all BLACK) due to sorted order of f[u].

Each time we choose a root for DFS GT, it can reach only:
• vertices in its SCC—get tree edges to these,
• vertices in SCC’s already visited in DFS GT—get no tree edges to these.

Recall that topological sort is a linear ordering of all vertices in a DAG G such that if G contains edge (u, v), then u appears before v in the ordering.

We are visiting vertices of GT in reverse of topologically sorted order.

u.f listed in decreasing order: 
     b, e, a, c, d, g, h, f
G a b c d e f g h
d 13 11 1 8 12 3 2 5
f 14 16 10 9 15 4 7 6
p e NIL NIL c b g c g
u.f listed in increasing order:
     e, a, b, d, c, f, g, h
GT a b c d e f g h
d 2 1 7 8 3 12 11 15
f 5 6 10 9 4 13 14 16
p b NIL NIL c a g NIL NIL
DFS (G)
       -- Initialize arrays
1 for each vertex u Î G.V do
2    u.color ¬ WHITE
3    u.p ¬ NIL
4 time ¬ 0
5 for each vertex u Î G.V do
6     if u.color = WHITE then
7        DFS-Visit (G, u)
DFS-Visit (G, u)
1       u.color ¬ GRAY
2 time ¬ time + 1
3 u.d ¬ time
4 for each vertex v Î G.Adj[u] do
5    if v.color = WHITE then
6       v.p ¬ u
7       DFS-Visit (G, v)
8 u.color ¬ BLACK
9 u.f ¬ time ¬ time + 1