Michael Data

For the [http://en.wikipedia.org/wiki/Disjoint-set_data_structure disjoint-set union problem], want to be able to keep track of which set each of a group of objects belongs to.

This should be done very quickly, in amortized constant time.

Operations

find(element x)

• * query which subset a particular element belongs to
• * return a constant identifier to represent the set
• * Can use a particular element as the identifier, as long as it is used consistently for all members

union(element a, element b)

• * join two subsets together into the union of their elements
• * is 'destructive' in that there is one fewer subset after the merge
Solutions

The naive solution is to use a tree structure to represent each subset. Each element has a parent reference which points to another node in its set. The tree root acts as the set identifier.

• find(q):
• if q.parent:
• return find(q.parent)
• else:
• return q
• union(a,b):
• A = find(a)
• B = find(b)
• if A is not B:
• A.parent = B

To perform better, two primary optimizations are used: union by size and path compression.

union by size

When combining the trees of two sets, maintain the root of the tree with higher depth. The depth of a node can be referred to as its “rank” (a.k.a. “union by rank”).

path compression

When performing find(x), update the parent of each element to point directly to the tree root.

• find(q):
• if q.parent:
• p = find(q.parent)
• q.parent = p
• return p
• else:
• return q

Performance

For n elements and m operations, the naive method can run in O(mn) in the worst-case scenario. After implementing union by rank, it becomes O(m log n).

Path compression combined with union by size allows for amortized performance total or per operation, where is the very slow-growing inverse-[http://en.wikipedia.org/wiki/Ackermann_function Ackermann] function on the size of the data.

Applications
• * maintain sets for each connected sub-tree while building the min/max spanning tree

Dynamic shortest path problem

• * find the shortest path between two nodes in a computer network in which links are appearing and disappearing
• * maintain dynamic connected component sets 