Michael Data

Sparse filtering is an algorithm that performs feature learning, introduced by Sparse Filtering, Ngiam et al 2011. Sparse filtering may converge faster than sparse coding, ICA, and sparse autoencoders for higher-dimension inputs.

The Sparse Filtering Algorithm

Execution

Execute by optimizing the objective function:

$$ minimize \sum_{i=1}^{M} {\left\Vert \hat{f}^{(i)} \right\Vert}_1 = \sum_{i=1}^{M} {\left\Vert \frac{\tilde{f}^{(i)}}{\Vert \tilde{f}^{(i)} \Vert_2} \right\Vert}_1 $$

Input is described not as raw data, but as a feature matrix of features per observation.

This describes a single layer of a feature learning matrix. Can apply to repeated layers in a greedy manner as described below to train a deep neural net.

Ngiam et al execute this for their tests using an out-of-the-box optimizer1)2).

Interpretation
  1. Normalize for uniform activity using $l_2$ norms
  2. Normalize per example using $l_2$-ball
  3. Normalize for sparseness using $l_1$

See introduction publication for detailed derivations3).

Sparsity Properties

Normalizing between features produces implicit competition between features.

  • Population Sparsity
    • Observations are represented by sparse number of features.
  • Lifetime Sparsity
    • Features are sparsely activated among all observations. The number of observations that each feature is activated in is sparse.
  • High Dispersal
    • No feature dominates over the rest; each feature is similarly distributed. This functions as a counterbalance to the previous two.

Training Deep Networks

Sparse filtering, as a framework, doesn't care what created the weights in the feature matrix. Ngiam used greedy layer-wise training4).

Business Applications