When is Multicalibration Post-Processing Necessary?
Dutch Hansen, Siddartha Devic, Preetum Nakkiran, Vatsal Sharan

TL;DR
This paper evaluates when multicalibration post-processing is necessary for predictive models, showing it benefits uncalibrated and large models, and that traditional calibration can sometimes imply multicalibration.
Contribution
It provides the first comprehensive evaluation of multicalibration post-processing across diverse datasets and models, offering practical insights and a new Python package.
Findings
Models calibrated out of the box are often multicalibrated without post-processing.
Multicalibration post-processing improves uncalibrated and large models.
Traditional calibration measures may implicitly ensure multicalibration.
Abstract
Calibration is a well-studied property of predictors which guarantees meaningful uncertainty estimates. Multicalibration is a related notion -- originating in algorithmic fairness -- which requires predictors to be simultaneously calibrated over a potentially complex and overlapping collection of protected subpopulations (such as groups defined by ethnicity, race, or income). We conduct the first comprehensive study evaluating the usefulness of multicalibration post-processing across a broad set of tabular, image, and language datasets for models spanning from simple decision trees to 90 million parameter fine-tuned LLMs. Our findings can be summarized as follows: (1) models which are calibrated out of the box tend to be relatively multicalibrated without any additional post-processing; (2) multicalibration post-processing can help inherently uncalibrated models and large vision and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
MethodsSparse Evolutionary Training
