The **bias-variance tradeoff** is a practical issue that applies to many predictive modeling problems, particularly regression methods using gradient descent.

There is a fundamental tradeoff between vias and variance. This is known as the **no free lunch** theorem.

**Overcomplete** or over-complex models tend to suffer from variance. Over-simplistic models tend to suffer from bias.

a.k.a. **approximation error**.

Bias: expected difference between our model's predictions and the true targets. Can be reduced by making the model more complex and flexible. A high-bias model is one with few parameters e.g. a linear predictor. A low-bias model is one with many parameters e.g. a large neural network.

a.k.a. **estimation error**.

Variance: variability in the model's predictions based on observations. Can be reduced by increasing the number of observations.

For a particular ,

Increases in model complexity tend to increase variance and decrease bias.

- * represents the inherent variability in y
- * Bias is “How closely we can approximate E[y|x]” in theory with optimal parameters
- * Variance is “How sensitive our parameter estimates are to training data”. How much the parameters will vary accross training sets of some size.

So .

For data set of size , we assume that each observation is of the form pairs.

is a distribution over all possible data sets of size (frequentist view). With respect to , is a random quantity because is random with respect to .

Can define

this is the average error at x, now averaged over multiple possible data sets of size .