Michael Data

Final Exam Notes

General Supervised Model

  • * Learner
  • Contains the “model”
  • * Evaluation or cost
  • based on output of Learner
  • * Algorithm
  • based on output of cost function
  • tweaks parameters of the learner

Types of Models

  • * (Regression used throughout)
  • * Nearest Neighbor
  • * Decision Trees
  • * Probability / Bayes classifiers
  • * Linear Models / perceptronsUnsupervised Learni
  • Perceptron algorithm
  • Logistic MSE
  • SVM w/ hinge loss
  • * Neural Networks
  • * Over/Under -fitting & Complexity
  • Features
  • Creation*
  • Selection
  • “Kernel” methods
  • Data
  • Size of test/train sets
  • Other Methods to control complexity
  • Regularization
  • Early stopping
  • Model parameters
  • Bagging
  • Boosting
  • Measure complexity
  • VC-Dimension
  • Shattering
  • Measure Over/Under -fitting*
  • Use holdout data to perform validation tests (Cross-validation)
  • Gives independent estimate of the model performance
  • Allows selection/optimization of model parameters
  • Examine position in the test/train error curves
  • Gives hint about over- or under-fitting

Unsupervised Learning

Unsupervised Learning
Used to “understand” the data. Creates a new simplified representation of the data.

Clustering

Clustering
Output a map $Z^i$ that assigns a data point $X^i$ to one of $k$ clusters.

Hypothesis Testing

Want to add a standard of rigor to testing.
$ x \to f(x) \to err(f) $
$ x \to g(x) \to err(g) $
What is the confidence in each error result?

  • * Estimate mean and variance of $err(x)$

Test $f < g$

  • * Null hypothesis: $H_0: f = g$
  • * Alt. hypothesis: $H_1: f < g$

Compute a “score” that is large if $H_1$ is true, but is low if $H_0$ is true.

  • * $p-value: Pr(s > observed score | H_0)$
  • NOT the probability that $H_0$ or $H_1$ are true

Perform tests. Repeatedly, is $err(f) < err(g)$? Then you can use a confidence test such as Student's t-test to estimate the expectation of these tests. The data used to perform these tests must not have been used in the training process for either model.

Eigendecomposition and SVD

$\Sigma = V \Lambda V'$

$X = USV'$

$X'X = VS'U'USV'$

$X'X = V (S^2) V' $

Ensemble Methods
Bias and Variance