Michael Data

The bias-variance tradeoff is a practical issue that applies to many predictive modeling problems, particularly regression methods using gradient descent.

There is a fundamental tradeoff between vias and variance. This is known as the no free lunch theorem.
Overcomplete or over-complex models tend to suffer from variance. Over-simplistic models tend to suffer from bias.

Bias

a.k.a. approximation error.
Bias: expected difference between our model's predictions and the true targets. Can be reduced by making the model more complex and flexible. A high-bias model is one with few parameters e.g. a linear predictor. A low-bias model is one with many parameters e.g. a large neural network.

Variance

a.k.a. estimation error.
Variance: variability in the model's predictions based on observations. Can be reduced by increasing the number of observations.
For a particular ,

Theoretical Properties

Increases in model complexity tend to increase variance and decrease bias.

• * represents the inherent variability in y
• * Bias is “How closely we can approximate E[y|x]” in theory with optimal parameters
• * Variance is “How sensitive our parameter estimates are to training data”. How much the parameters will vary accross training sets of some size.

So .

For data set of size , we assume that each observation is of the form pairs.

is a distribution over all possible data sets of size (frequentist view). With respect to , is a random quantity because is random with respect to .

Can define

this is the average error at x, now averaged over multiple possible data sets of size .