# Michael Data

#### General Supervised Model

• * Learner
• Contains the “model”
• * Evaluation or cost
• based on output of Learner
• * Algorithm
• based on output of cost function
• tweaks parameters of the learner

### Types of Models

• * (Regression used throughout)
• * Nearest Neighbor
• * Decision Trees
• * Probability / Bayes classifiers
• * Linear Models / perceptronsUnsupervised Learni
• Perceptron algorithm
• Logistic MSE
• SVM w/ hinge loss
• * Neural Networks
• * Over/Under -fitting & Complexity
• Features
• Creation*
• Selection
• “Kernel” methods
• Data
• Size of test/train sets
• Other Methods to control complexity
• Regularization
• Early stopping
• Model parameters
• Bagging
• Boosting
• Measure complexity
• VC-Dimension
• Shattering
• Measure Over/Under -fitting*
• Use holdout data to perform validation tests (Cross-validation)
• Gives independent estimate of the model performance
• Allows selection/optimization of model parameters
• Examine position in the test/train error curves
• Gives hint about over- or under-fitting

#### Unsupervised Learning

Unsupervised Learning
Used to “understand” the data. Creates a new simplified representation of the data.

### Clustering

Clustering
Output a map that assigns a data point to one of clusters.

##### Hypothesis Testing

Want to add a standard of rigor to testing.  What is the confidence in each error result?

• * Estimate mean and variance of Test • * Null hypothesis: • * Alt. hypothesis: Compute a “score” that is large if is true, but is low if is true.

• * • NOT the probability that or are true

Perform tests. Repeatedly, is ? Then you can use a confidence test such as Student's t-test to estimate the expectation of these tests. The data used to perform these tests must not have been used in the training process for either model.

##### Eigendecomposition and SVD    ##### Bias and Variance 