- * Learner
- Contains the “model”
- * Evaluation or cost
- based on output of Learner
- * Algorithm
- based on output of cost function
- tweaks parameters of the learner

- * (Regression used throughout)
- * Nearest Neighbor
- * Decision Trees
- * Probability / Bayes classifiers
- * Linear Models / perceptronsUnsupervised Learni
- Perceptron algorithm
- Logistic MSE
- SVM w/ hinge loss
- * Neural Networks

- * Over/Under -fitting & Complexity
- Features
- Creation*
- Selection
- “Kernel” methods
- Data
- Size of test/train sets
- Other Methods to control complexity
- Regularization
- Early stopping
- Model parameters
- Bagging
- Boosting
- Measure complexity
- VC-Dimension
- Shattering
- Measure Over/Under -fitting*
- Use holdout data to perform validation tests (Cross-validation)
- Gives independent estimate of the model performance
- Allows selection/optimization of model parameters
- Examine position in the test/train error curves
- Gives hint about over- or under-fitting

Unsupervised Learning Used to “understand” the data. Creates a new simplified representation of the data.

Clustering Output a map $Z^i$ that assigns a data point $X^i$ to one of $k$ clusters.

Want to add a standard of rigor to testing. $ x \to f(x) \to err(f) $ $ x \to g(x) \to err(g) $ What is the confidence in each error result?

- * Estimate mean and variance of $err(x)$

Test $f < g$

- * Null hypothesis: $H_0: f = g$
- * Alt. hypothesis: $H_1: f < g$

Compute a “score” that is large if $H_1$ is true, but is low if $H_0$ is true.

- * $p-value: Pr(s > observed score | H_0)$
- NOT the probability that $H_0$ or $H_1$ are true

Perform tests. Repeatedly, is $err(f) < err(g)$? Then you can use a confidence test such as Student's t-test to estimate the expectation of these tests. The data used to perform these tests must not have been used in the training process for either model.

$\Sigma = V \Lambda V'$

$X = USV'$

$X'X = VS'U'USV'$

$X'X = V (S^2) V' $