Classic model approach with roots in biology research.

is called the kth mixture component.

z can be a binary indicator vector. Only one equals one.

Then let be a mixture distribution over z.

example: mixture of arbitrary densities of distributions

indicates a Gaussian

indicates an exponential

indicates a Gamma

example: mixture of Gaussians

s a Gaussian density with parameters .

For mixture weights ,

example: mixture of conditionally independent Bernoulli trials

They are a very flexible approach to density estimation. Allow writing a complex density as a combination of simpler ones. Appropriate when you want to model systems with real physical component phenomena.

Assume that each data point is generated from only a single component.

- Generative model.
- for i=1 to N:
- k* ← sample a component for ith data point ~ p(z=k)
- xi ← sample a data point from compenent k* ~ p_k(x|\theta_k,z=k*)

- are component parameters

The problem with this approach is that the summation over unknown values is intractable in even simple cases.

K-Means is the non-probabilistic version of EM.

Expectation Maximization is typically used to solve these problems

Kernel Density Estimation can work well in low dimensions but doesn't scale well.