As a special case of finite mixture models, place a Gaussian at each of the N data points. Then take the sum of these Gaussians. The variance of each Gaussian must be chosen as a parameter, and has a large impact on the result.

is often chosen as . is referred to as the **bandwidth** and can be selected by cross validation.

KDE doesn't scale well into high dimensions. The number of data points required scales with . It may still work well in low dimensions though.