**Ensemble Methods** combine different supervised learning techniques to improve performance.

“Bootstrap Aggregation”

Method to reduce over-fitting. Won't help if the existing model doesn't already over-fit. Uses **bootstrap** training technique, which creates random subsets of training data.

- * Repeat K times
- Bootstrap N' size training set
- Sample N' number of data points from original training set, with replacement
- train a classifier on this set
- * To test, run each classifier
- To classify, use a voting method for final prediction
- For regression, use a function of classifier outputs
- Use Boosting to weight base classifiers

Bagging applied to decision trees.

- * Problem: Bootstrapping doesn't work well for constructing forests from very large data sets
- Why: When data set is large, decision stumps tend to do the same thing
- Solution: also bootstrap over features available for decision nodes
- Conventional approach is to make new subset of features available at each node construction
- Can also “block” restrict an entire tree from using a feature. This is data efficient, but less effective

Method to increase complexity. Weighted combination of learners.

- * Somehow focus new learners on examples the others get wrong.

Boosting technique for Logistic Regression. Called “gradient” boosting because error residuals computed by MSE are analogous to the gradient of an MSE cost function.

- * Learn a regression predictor
- * Compute the error residual
- * Learn to predict the residual

The error residual provides a nice device for future predictors to train on. It discourages them from attention to data that are already well-predicted. With each iteration, the variance of the error residual should decrease and it should become `more`

uniform.