Michael Data

Binary Classification Terminology

True positive “TP” = Model predicted positive, True label is also positive
False positive

True Positive Rate “TPR” = TP / (TP + FN)
False Positive Rate “FPR” = FP / (FP + TN)

  • * a.k.a. false alarm rate

Precision = TP/(TP+FP) = ratio of correct positives predicted to total positive predicted
Recall = TP/(TP+FN) = ratio of correct positives predicted to actual number of positives

There is typically a tradeoff between precision and recall.

Calibration

To evaluate the probabilities used for ranking…
Logistic regression tends to be more calibrated than naive Bayes.

Ranking Visualization

ROC Curves

A receiver-operator characteristic ROC curve plots the FPR on horizontal axis and TPR on vertical axis. The area under this curve is sometimes used as a point statistic to represent performance.

The F-measure or breakeven point is a point-estimate summary of the ROC curve.

Lift Curves

An alternate is to use a lift curve. Lift = (actual positive examples detected by the model) / (number of positive examples by a random method)

Profit Curves

For test data, can plot profit on vertical axis and portion of tests predicted positive on the horizontal axis.