Michael Data

Most learning algorithms can be described as a combination of:
* Model
* Objective Function
* Optimization Method

Techniques

Classification
* Spam email
* Classify sentiment of a product review

Regression
* Predict a real-valued number

Ranking
* Find most likely candidates from a group

Notation

Training data: (features, labels) pairs typically represented in a table.

Want to learn a model to predict a label given features.
The model is typically represented as a function where are input features and is a vector of parameters.

The quality of a model is evaluated with an error function.
* e.g. sum of squared error: .

Empirical Learning

Goal is to minimize the total error on training data. This is an optimization problem. There is occasionally a direct solution via e.g. linear algebra. Typically a gradient approach of some kind is needed.

Overfitting

Minimizing training error doesn't give the best possible prediction of future data. Increasing test error during training is called overfitting. Overfitting can be controlled by switching to a simpler model.

Bias and Variance

In practice, predictive models are limited by the Bias-Variance Tradeoff.

Models

Linear weighted sums of the input variables (linear regression).
Non-linear functions of linear weighted sums (logistic regression, neural networks, GLMs).
Thresholded functions (decision trees).

To improve a model, model performance is important.
Compare to a baseline. Relative to the error rate, you can measure the reduction in error provided by switching to the model in question. Want to establish that the reduction in error is not due to random chance.
e.g. in classification, the simplest baselin is to always predict the most likely class, ignoring .
Alternately, examining a confusion matrix can explain mistakes or patterns in the classifier.

Objective Functions

Regression
* Squared Error (L2)
* Absolute Error (L1)
* Robust loss, log-loss, log-likelihood

Classification
* classification error
* margin
* log-loss, log-likelihood