22.5 Strongly Connected Component |
Document last modified: |
Overview
Graphs are of special interest in networking applications. Strongly connected components define network nodes connected by multiple, therefore redundant paths.
The Web, essentially one large directed graph.
To gather pages for search engines, Web crawlers move from one vertex (page) to
another along edges defined by hyperlinks.
Strongly Connected Components (SCC) define a special relationship among a set of vertices that can be exploited.
For example, to improve ranking relevancy, some search engines define sites that are referenced by others as authorities on a particular topic, Kleinberg, J. (1999) Authoritative sources in a hyperlinked environment. Journal of the Association for Computing Machinery, 46(5), 604–632.
The figure at right, Web sites A-F have pages with hyperlinks as edges. ABC is not while A, BC, EF and D are SCC's.
G is strongly connected, H is not.
Definition
A SCC of G = (V, E) is a maximal set of vertices C, where C Í V such that "u, v Î C, u
v, and v
u.
G
GT
Basic method for finding the SCC's of G:
- Perform DFS on G
- Perform DFS on GT, the transpose of G formed by reversing paths.
- The DFS subtrees common to both G and GT comprise a SCC.
Recall that roots of DFS trees occur where v.p = NIL for v Î V.
Example: Figures at right
(a) G. DFS G trees:
b c
| / \
e d g
| / \
a f hb.d < e.d < a.d < a.f < e.f < b.f
(b (e (a a) e) b)
Question 22.19 - Give the parenthesization for DFS(G) tree rooted at c.
(b) GT the transpose of G formed by reversing paths. DFS(
GT ) trees:
b c g h
| | |
a d f
|
e(c) GSCC
are those DFS subtrees common to G and GT: abe, cd, gf, h
Connections between SCC's:
- abe ® cd because G contains b ® c
- cd ® fg because G contains c ® g
- cd ® h because G contains d ® h
- etc.
u.f listed in decreasing order:
b, e, a, c, d, g, h, f
G a b c d e f g h d 13 11 1 8 12 3 2 5 f 14 16 10 9 15 4 7 6 p e NIL NIL c b g c g u.f listed in increasing order:
e, a, b, d, c, f, g, h
GT a b c d e f g h d 2 1 7 8 3 12 11 15 f 5 6 10 9 4 13 14 16 p b NIL NIL c a g NIL NIL
Strongly-Connected-Components (G)
|
Example - After running DFS (GT), Line 4 of the algorithm produces the SCC's:
u.f 4, 5, 6 u.p e, a, b
u.f 9, 10 u.p d, c
u.f 13, 14 u.p f, g
u.f 16 u.p h
Key idea
From the diagrams:
e, b Î C, a SCC.
e
b and b
e in G
e
b and b
e in GT
meaning there are paths between vertices b and e in both G and GT.
DFS(G) forms trees in one direction.
DFS(GT) forms trees in the opposite direction starting at the root of DFS(G) trees (topological sort).
DFS(G) forms DFS trees we'll call SCC C. The maximum u.f is a DFS tree root.
GT has reversed edges, a DFS(G) root now starts a DFS(GT).
DFS(GT) starts with SCC C such that x Î C, x.f is maximum (i.e. a DFS(G) root). Start at b.
DFS(GT) starts from x Î C, e.g. b, and visits all vertices in SCC C, constructing a DFS tree of those vertices.
DFS(GT) only considers DFS(G) trees starting from x Î C, and visits all vertices in SCC C, constructing a DFS tree of those vertices.
DFS(G) trees b c
| / \
e d g
| / \
a f hDFS(GT) trees b c g h
| | |
a d f
|
e
DFS(G) u.f listed in decreasing order:
b, e, a, c, d, g, h, f
G a b c d e f g h d 13 11 1 8 12 3 2 5 f 14 16 10 9 15 4 7 6 p e NIL NIL c b g c g DFS(GT) u.f listed in increasing order:
e, a, b, d, c, f, g, h
GT a b c d e f g h d 2 1 7 8 3 12 11 15 f 5 6 10 9 4 13 14 16 p b NIL NIL c a g NIL NIL Corollary 22.15
Let C and C' be distinct strongly connected components in directed graph G=(V, E).
Suppose that there is an edge (u, v) Î ET where u Î C and v Î C'.
Then f(C') > f(C).
Corollary says that since f(C') > f(C) for all C' ≠ C, there are no edges from C to C' in GT. Therefore, DFS will visit only vertices in C.
Which means that the depth-first tree rooted at x contains exactly the vertices of C.
Example: There is no path from component abe ® cd in GT at right because f(cd) > f(abe).
Question 22.20 Is f(gf) > f(cd) from the DFS GT at right? What does that mean?
The next root chosen in DFS GT is in SCC C' such that f(C) is maximum over all SCC’s other than C'.
DFS visits all vertices in C', but the only edges out of C' go to C, which we’ve already visited.
Therefore, the only tree edges will be to vertices in C'. We can continue the process.
Example: There is an edge in ET, cd ® abe, but have already visited abe (all BLACK) due to sorted order of f[u].
Each time we choose a root for DFS GT, it can reach only:
• vertices in its SCC—get tree edges to these,
• vertices in SCC’s already visited in DFS GT—get no tree edges to these.Recall that topological sort is a linear ordering of all vertices in a DAG G such that if G contains edge (u, v), then u appears before v in the ordering.
We are visiting vertices of GT in reverse of topologically sorted order.
| u.f listed in decreasing order: b, e, a, c, d, g, h, f
|
u.f listed in increasing order: e, a, b, d, c, f, g, h
|
|
|
||||||||||||||||||||||||||||||||||||||