Directed tree structured Markov networks can be easily transformed into an efficient **product-form distribution**, represented by a “cluster tree”. The **tree width** is the maximum number of variables within a single cluster minus one. If the tree is a path, then the **path width** is the tree width when decomposed into a chain graph.

Given a directed tree graph G, produce a product-form distribution by taking the product, starting at the root of G.

Alternately, the product-form distribution can be constructed by multiplying the joint distribution of each clique together, divided by their intersections.

This method has the benefit that it applies to any graph of cliques arranged in a tree structure.

For a tree-decomposition with edge between and ,

The **separator-width** is the maximum separator in the tree.

For a tree-decomposition with edge between and ,

So we desire to decompose cyclical graphs into a tree structure. Chordal I-maps are decomposable. The fill-in algorithm can test for chordality and `triangulate`

or make the graph chordal.

Cutset conditioning can produce a tree-decomposition from a dependency graph.

This is a method for constructing a join-tree a.k.a “junction tree”.

Decompose a graphical model , its scopes , and its primal graph , where X = {} and F = {}. Output is a **join-tree decomposition**.

- Select a variable ordering
- Triangulate the graph in reverse-order
- Create a join-tree of the induced triangulated graph
- Identify all maximal cliques in the chordal graph
- create a tree structure over the cliques

##* In reverse-variable order:

##*# add the clique to the tree by connecting it to another node with which it share the largest subset of variables

- Place each input function in a cluster containing all variables in its scope

Simpler version:

- Use the fill-in algorithm if necessary
- Identify all cliques of G
- Order the cliques as by rank of the highest vertex in the clique
- Connect each to a predecessor where and shares the most vertices with

Once a join tree is created, “Messages” can be passed between clusters by marginalizing over non-shared variables. Each cluster will originate with a funtion of its member variables, but will then receive “messages” in the form of functions over sub-sets of variables shared with adjacent clusters. After receiving all messages, a joint marginal can be computed from the product of all of these functions within the cluster.

Generalization of Bucket-Tree Elimination, **Cluster-Tree Elimination** or **CTE** is a message-passing algorithm for producing a tree with [http://en.wikipedia.org/wiki/Explicit_and_implicit_methods model explicit] clusters. So each cluster becomes an independent representation of the variables within its scope.

- Input: a tree decomposition
- while not converged:
- for every edge(u,v) in the tree:
- If u has received messages from all adjacent vertices other than v:
- compute message u→v = product of:
- all messages received by u from neighbors other than v
- sum of local conditional probabilities of u
- send u→v to v

Convergence is guaranteed, and occurs when no new messages are created. If processing is performed from leaves to root, convergence is guaranteed after two passes.

Time complexity:

Space complexity:

- * N = number of vertices in the tree-decomposition
- * w = the tree-width of the decomposition
- * r = the number of functions in the model
- * degree = the maximum degree in the tree-decomposition
- * k = maximum domain size of any variable

A tree-decomposition can be transformed to reduce separator-width by combining nodes separated by the greatest width.

Cluster-tree elimination applied to the resulting graph will have time complexity and space , where is the maximal number of nodes in a cluster and is the separator-width of the resulting decomposition.