**Machine Learning** is a class of techniques for learning a model of data, frequently used in data mining and business intelligence.

**Learning** refers to learning the parameters of a model. It is also referred to as **inference** in Bayesian methods. Once a model has been learned, it might be used for predictive modeling or exploratory data analysis.

In a **generative model**, there is also usually a way to “go back” and generate or simulate data by fixing the parameters.

From a Bayesian framework, we would want to estimate something like .

The prior probability over can be controvercial, so ignoring it or using a flat prior, the likelihood alone is typically used to carry the parameter estimation.

for , and .

Can write likelihood as using conditional independence.

This is a conditional model, but hasn't defined the actual probabilities yet. The one which is most commonly used in machine learning is the logistic function:

It defines a linear decision boundary. This is logistic regression, an example of a **generalized linear model**.

<Merge with Classification>

Learn a model for a joint distribution to predict class label of a new vector .

We can compute using Bayes' rule.

.

Key Points

- * Learn a model for how the s are distriubted for each class
- i.e. uses parameters for class
- requires a
**distributional/parametric assumption** - e.g. Gaussian multivariate model
- * Also have to learn values, though this is easy
- * Likelihood decomposes into separate optimization problems if s are unrelated
- * This approach is theoretically optimal if:
- The distributional assumptions are correct
- we can learn the true/optimal parameters
- * Predict using Bayes' rule:

Need to learn sets of parameters , for .

There is sensitivity to the Gaussian assumption. Due to parameters per class, this can scale poorly as d increases. In practice, in high-dimensional problems you can assume that the s are diagonal.

In Naive Bayes Classification, you model .

With sequence data, can do classification by learning a Markov Model for each class.

- * is sequence number
- * is the transition probability from to .

Treat the parameters as random variables.

In particular, before observing any data, there is the **prior** , a prior density for .

As more data is gathered, the role of the prior is reduced. It is more influential when data is limited.

is known as the **posterior** density. In comparison to Maximum Likelihood Estimation:

= mean weight of fish in a lake, assuming Gaussian likelihood.

Here is the mean of the prior and is variance of the prior.

, , .

,

A common choice for a prior on is the Beta density: .

The posterior is also in a Beta form due to **conjugacy**.

The MPE effectively smooths the maximum likelihood estimate. The larger , are more smoothing. and are referred to as **pseudo counts** because they effectively count the number of trials and successes of previous trials. is the pseudocount for the number of successes, and is the pseudocount for the prior number of trials.

As , MPE for . Variance of as .

The posterior density is the same form as the prior when using a **conjugate prior**. In this case, the Beta is a conjugate prior to the Bernoulli.

, , , .

e.g. = occurrence of a word in a document. = number of unique words.

= number of 's taking value in . .

.

The maximum likelihood in this case may require smoothing if we don't want it assigning zero probabilities.

A conjugate prior to the multinomial is the Dirichlet distribution. This is a generalization of the Beta to higher dimensions.

The parameters are directly analogous to the Beta Binomial prior parameters.

e.g. for text, the s could be proportional to frequency of words in English text.

The posterior density will have the form where .

The **prior mean** .

The **posterior mean** .

Common in tracking problems. Assume that the movement of the object has some Gaussian noise to it.

, , assuming , and s are conditionally independent given .

The conjugate prior is Gaussian.

where is the mean of the prior, and represents uncertainty about the prior.

Posterior