# Michael Data

When a joint distribution can't be computed directly, but marginals can, Gibbs Sampling is a MCMC algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables.

1. Sample a new variable value one at a time from the variable's conditional distribution: Samples are dependent and form a Markov chain with stationary distribution P(x|evidence).

The process can be understood as a random walk in the space of all instantiations of a variable.

Converges iff the chain is irreducible and ergodic. These conditions are satisfied when all probabilities are positive.

##### Example
• For three variables x y z,
• initialize with z' y' z' (e.g. randomly)
• iterate:
• sample new x' ~ p(x|y',z',data)
• sample new y' ~ p(y|x',z',data)
• sample new z' ~ p(z|y',x',data)
• continue for some number of iterations (large number)
• each iteration consists of a sweep through the hidden variables or parmeters (here x,y,z)

After an initial burn-in period, the samples x',y',z' will be samples from the true joint distribution , providing empirical estimates.

##### Hacks

Reducing variance can speed up convergence. Due to the bias-variance tradeoff, reducing variance can reduce error, which leads to faster convergence.

Blocking?

Want to know .
Counting method: Count the number of samples where .

Averaging method: .

The averaging method is better because it is always nonzero when the Markov network is nonzero there.

##### Collapsed Gibbs Sampling

Sample a subset of variables, and integrate out the rest. In the case of a probability model, can treat parameters as variables to integrate out.

When processing a data point, compute directly, where .

Do this by integrating over the parameters of the base distribution.

#### MOG Example  