Michael Data

Tree Decomposition

Directed tree structured Markov networks can be easily transformed into an efficient product-form distribution, represented by a “cluster tree”. The tree width is the maximum number of variables within a single cluster minus one. If the tree is a path, then the path width is the tree width when decomposed into a chain graph.
$w^* \leq pw^* \leq \dots$

Product-Form Inspection Methods

Directed Trees

Given a directed tree graph G, produce a product-form distribution by taking the product, starting at the root of G.

$p(1,\dots,7) = p(3)p(1|3)p(2|3)p(4|3)p(5|4)p(6|4)p(7|4)$

Product Division

Alternately, the product-form distribution can be constructed by multiplying the joint distribution of each clique together, divided by their intersections.
$p(1,\dots,7) = \frac{p(1,3)p(2,3)}{p(3)} * \frac{p(3,4)}{p(3)} * \frac{p(4,5)}{p(4)} * \frac{p(4,6)}{p(4)} * \frac{p(4,7}{p(4)}$

This method has the benefit that it applies to any graph of cliques arranged in a tree structure.

Separation

For a tree-decomposition with edge between $B_j$ and $B_i$,
$sep(i,j) = \text{scope}(B_I) \cap \text{scope}(B_J)$

The separator-width is the maximum separator in the tree.

Elimination

For a tree-decomposition with edge between $B_j$ and $B_i$,
$elim(i,j) = \text{scope}(B_I) - sep(i,j) = \text{scope}(B_I) - \text{scope}(B_I) \cap \text{scope}(B_J)$

Decomposition

So we desire to decompose cyclical graphs into a tree structure. Chordal I-maps are decomposable. The fill-in algorithm can test for chordality and triangulate or make the graph chordal.

Cutset conditioning can produce a tree-decomposition from a dependency graph.

Minimal Decomposition

A tree decomposition is minimal if for all neighbors, neither is a subset of the other.

Join-Tree Clustering

This is a method for constructing a join-tree a.k.a “junction tree”.
Decompose a graphical model $\mathcal{M} = <X,D,F,\Pi>$, its scopes $S = S_1,\dots,S_r$, and its primal graph $G=(X,E)$, where X = {$X_1,\dots,X_n$} and F = {$f_1,\dots,\f_r$}. Output is a join-tree decomposition.

  1. Select a variable ordering $d = (X_1,\dots,X_n)$
  2. Triangulate the graph in reverse-order
  3. Create a join-tree of the induced triangulated graph
    1. Identify all maximal cliques in the chordal graph
    2. create a tree structure over the cliques

##* In reverse-variable order:
##*# add the clique to the tree by connecting it to another node with which it share the largest subset of variables

  1. Place each input function in a cluster containing all variables in its scope

Simpler version:

  1. Use the fill-in algorithm if necessary
  2. Identify all cliques of G
  3. Order the cliques as $C_1,\dots C_t$ by rank of the highest vertex in the clique
  4. Connect each $C_i$ to a predecessor $C_j$ where $(j < i)$ and $C_j$ shares the most vertices with $C_i$

Once a join tree is created, “Messages” can be passed between clusters by marginalizing over non-shared variables. Each cluster will originate with a funtion of its member variables, but will then receive “messages” in the form of functions over sub-sets of variables shared with adjacent clusters. After receiving all messages, a joint marginal can be computed from the product of all of these functions within the cluster.

Cluster-Tree Elimination

Generalization of Bucket-Tree Elimination, Cluster-Tree Elimination or CTE is a message-passing algorithm for producing a tree with [http://en.wikipedia.org/wiki/Explicit_and_implicit_methods model explicit] clusters. So each cluster becomes an independent representation of the variables within its scope.

  • Input: a tree decomposition
  • while not converged:
  • for every edge(u,v) in the tree:
  • If u has received messages from all adjacent vertices other than v:
  • compute message u→v = product of:
  • all messages received by u from neighbors other than v
  • sum of local conditional probabilities of u
  • send u→v to v

Convergence is guaranteed, and occurs when no new messages are created. If processing is performed from leaves to root, convergence is guaranteed after two passes.

Time complexity: $O((r+N) * \text{degree} * k^{w+1})$
Space complexity: $O(N k^{\text{max separator size}})$

  • * N = number of vertices in the tree-decomposition
  • * w = the tree-width of the decomposition
  • * r = the number of functions in the model
  • * degree = the maximum degree in the tree-decomposition
  • * k = maximum domain size of any variable

Super Clusters

A tree-decomposition can be transformed to reduce separator-width by combining nodes separated by the greatest width.
Cluster-tree elimination applied to the resulting graph will have time complexity $O(m * \text{degree} * e^{r})$ and space $O(n * e^{s})$, where $r$ is the maximal number of nodes in a cluster and $s$ is the separator-width of the resulting decomposition.