Michael Data

K-Means

Non-probabilistic variant of Expectation Maximization. Uses hard membership parameters.
Given data $D = {x_i}, i = 1 \dots N, K$

  1. Initialize

#* Randomly select K data points and use them as “means”
#* Randomly assign each data point to one of K clusters, then compute the mean from each cluster

  1. Update
    1. Assign each data point to its closest mean (or center) using Euclidean distance
    2. Recompute the K means based on their member data points
    3. Check for convergence
  2. Convergence

#* When the sum of squared error or distances from each point to its mean stops decreasing
#* When no data point changes membership between iterations