**Linear models** are models that are linear in the parameter . So the computation complexity is linear as a function of the number of parameters.

e.g.

The functional form can have non-linear functional form while still being linear in the number of parameters. This is still considered a “linear” model.

Training data: set of pairs, where . is the prediction model.

By using a linear model and a least-squares fitting method, the objective function remains convex.

A problem with linear models is that they can return results which are negative or otherwise not nicely interpretable as probabilities. A common solution is to use a logistic function and perform Logistic Regression.

- =
- This is an example of a generalized linear model where is referred to as the
`link function`

A Least Squares error function means is a concave function, so we can use gradient descent to find

where is a step size and is a gradient

- * is the target
- * is the prediction where is an N-by-d matrix and is an N-by-1 vector.

Can show that mse is minimized when

- * or the form , where in this form, theta is a set of d linear equations
- *

If you have a low-dimensional problem and lots of data, this is “easy”. If it is a high-dimensional problem and some of them are codependent, then the problem can be underdetermined and difficult or impossible to solve this way.

Both methods have complexity . The is for creating the matrix, and the is for solving linear equations or inverting a -by- matrix.

Because is a concave function, in principle you can use gradient descent to find .

Assume (x_i,y_i) are IID samples from some underlying density .

- * is a distribution over y for a fixed x value. It may vary due to noise or variance.
- * So the inne bracketed term becomes the expected value of mse for a given x, due to uncertainty of y.

When The inner error term is the part affected by the parameters:

- * The first term is the “natural” variability in y at a particular x value
- * The second term can be manipulated by controlling theta
- * So the optimal model will minimize the second term by setting

In practice, regression is also limited by the Bias-Variance Tradeoff.