##### Final Exam Notes

#### General Supervised Model

* Learner

Contains the “model”

* Evaluation or cost

based on output of Learner

* Algorithm

based on output of cost function

tweaks parameters of the learner

### Types of Models

* (Regression used throughout)

* Nearest Neighbor

* Decision Trees

* Probability / Bayes classifiers

* Linear Models / perceptronsUnsupervised Learni

Perceptron algorithm

Logistic MSE

SVM w/ hinge loss

* Neural Networks

* Over/Under -fitting & Complexity

Features

Creation*

Selection

“Kernel” methods

Data

Size of test/train sets

Other Methods to control complexity

Regularization

Early stopping

Model parameters

Bagging

Boosting

Measure complexity

VC-Dimension

Shattering

Measure Over/Under -fitting*

Use holdout data to perform validation tests (Cross-validation)

Gives independent estimate of the model performance

Allows selection/optimization of model parameters

Examine position in the test/train error curves

Gives hint about over- or under-fitting

#### Unsupervised Learning

Unsupervised Learning

Used to “understand” the data. Creates a new simplified representation of the data.

### Clustering

##### Hypothesis Testing

Want to add a standard of rigor to testing.

What is the confidence in each error result?

* Estimate mean and variance of

Test

* Null hypothesis:

* Alt. hypothesis:

Compute a “score” that is large if is true, but is low if is true.

*

NOT the probability that

or

are true

Perform tests. Repeatedly, is ? Then you can use a confidence test such as Student's t-test to estimate the expectation of these tests. The data used to perform these tests must not have been used in the training process for either model.

##### Eigendecomposition and SVD

##### Ensemble Methods

##### Bias and Variance