- Bucket Elimination
**is a Tree Decomposition method for improving Belief Propagation in Markov Networks and Bayesian Networks. It's clusters are called “buckets”. Time and space trade-offs can be produced with the addition of Cutset Conditioning. == Bucket Elimination Belief Propagation ==**BE-Bel**or “Bucket Elimination Belief Propagation” optimizes the belief propagation task. - Define an ordering of the variables - Assign all functions to a**bucket**#* is a function of that variable #* latest-ordered bucket gets precedence #* When the**bucket**is summed over, the result is no longer a function of the original variable - In reverse order, sum over all buckets to marginalize #* Variables which have been observed or we have some information about can use a restricted sum #* Buckets that sum to 1 are irrelevant and may be pruned**Induced Width**: the largest arity function involved in the elimination process. Computation will be exponential in the induced width. It reflects the total number of variables that must be summed over together. Time complexity will be O[n*(the induced width + 1)]. Space complexity will be of O[n*(the induced width)] == Handling Observations == For the bucket of the observed variable, assign that range to the variable within each function in the bucket. This can greatly improve the induced width of the graph because the variable has effectively been eliminated. == Time and Space Complexity == Time complexity is dominated by the largest bucket (in terms of the number of variables in the bucket). Because computing that bucket requires iterating through $O(k^w)$ parameters. In this case, $w$ is equivalent to the induced-width of the graph. Total time complexity is then $O(r * k^{w^* + 1})$ and space complexity is $O(n * k^{w^*})$. * * $w^*$ is the induced width of the primal graph * * $k$ is the maximum size of all variable domains * * $r$ is the number of functions used * * $n$ is the number of variables == Pruning Nodes == Variables which are not dependent on the query or the evidence can be removed from the network. Their bucket will sum to 1. == Bucket Elimination MAP ==**BE-map**or “Bucket Elimination maximum a-posteriori” identifies variable instantiations which maximize a-posteriori probability. $max P(a,c) = max_{a,c} \sum_{b,d,e} P(a,b,c,d,e) = max_{a,c} P(a)P(c|a) \sum_{b,d,e} P(b|a)P(d|a,b)p(e|b,c)$ - Use bucket elimination over the summation variables - Do the summation - Use bucket elimination over the maximization variables - Do the maximization == Elim-MPE == What does this abbreviation-acronym represent? $max P(a,b,c,d,e) = max_{a,b,c,d,e} P(a,b,c,d,e) = $ - Use bucket elimination to optimize maximizations over all variables. - Recursively assign maximum values for each variable in forward-bucket order == Bucket Trees == Using a Tree Decomposition allows for message-passing up and down the tree. This allows for**Bucket Tree Elimination**to quickly find marginals on variables other than the root of the tree without constructing a new tree. Message $\lambda^i$ is a function of all $\lambda$ functions and conditional probabilities which were previously assigned to bucket $i$, but summed over variable $i$. These messages are calculated in reverse-bucket-order. $\lambda$ messages correspond directly to “pushing in” summation operations in the full joint formula. Message $\pi^i$ is a function of all $\lambda$ functions and conditional probabilities which were previously assigned to bucket $i$, and is calculated in forward-order after all $\lambda$ messages have been sent. === Bucket-Tree Elimination === In order to get marginal estimates on all variables, retain the bucket structure in the form of a tree. Once necessary variables are given as priors, use those to infer probabilities on other variables back up the tree. * * Each bucket has variables and functions * * Between nodes in the bucket-tree, variable elimination is not symmetric * * Time complexity is $O(r * \text{degree} * k^{w^* + 1})$ * degree is the maximum degree in the bucket-tree * $w^*$ is the induced width of the variable graph * $k$ is the maximum size of all variable domains * $r$ is the number of functions used * * Space complexity is $O(n k^{w^*})$ == Superbuckets == Combining local “clusters” of buckets with many common variables into**superbuckets** allows for a space reduction of messages in exchange for greater computation time. The space benefit is in that buckets with common variables do not have to create additional messages to each other. Managing the size of these superbuckets allows for a time-space tradeoff.

Bound complexity by splitting a bucket into mini-buckets. Use incorrect independence assumptions to do the split, but the effective tree-width of the graph can be reduced this way.

Aim is to minimize the number of mini-buckets in the partition by including in each minibucket as many functions a possible within the `i`

-bound.

- Single function mini-buckets are decreasingly ordered by arity
- each minibucket is combined into the left-most minibucket that it can without breaking the
`i`

-bound

Can do “bounded inference” by identifying lower and upper bounds on a joint probability.