Obtaining Calibrated Probabilities from Boosting

Alexandru Niculescu-Mizil; Richard A. Caruana

arXiv:1207.1403·cs.LG·July 9, 2012·155 cites

Obtaining Calibrated Probabilities from Boosting

Alexandru Niculescu-Mizil, Richard A. Caruana

PDF

Open Access

TL;DR

This paper investigates the calibration issues of boosting algorithms, compares three calibration methods, and explores the impact of different loss functions on probability estimation accuracy.

Contribution

It provides an empirical analysis of probability distortion in AdaBoost and evaluates calibration techniques and loss functions to improve probability estimates.

Findings

01

Logistic Correction and log-loss boosting perform well with weak models.

02

Platt Scaling and Isotonic Regression significantly improve probability calibration.

03

Calibration methods vary in effectiveness depending on model complexity.

Abstract

Boosted decision trees typically yield good accuracy, precision, and ROC area. However, because the outputs from boosting are not well calibrated posterior probabilities, boosting yields poor squared error and cross-entropy. We empirically demonstrate why AdaBoost predicts distorted probabilities and examine three calibration methods for correcting this distortion: Platt Scaling, Isotonic Regression, and Logistic Correction. We also experiment with boosting using log-loss instead of the usual exponential loss. Experiments show that Logistic Correction and boosting with log-loss work well when boosting weak models such as decision stumps, but yield poor performance when boosting more complex models such as full decision trees. Platt Scaling and Isotonic Regression, however, significantly improve the probabilities predicted by

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Imbalanced Data Classification Techniques · Machine Learning and Data Classification