On the Richness of Calibration
Benedikt H\"oltgen, Robert C Williamson

TL;DR
This paper analyzes various calibration scores for probabilistic predictions, proposing a comprehensive framework that includes grouping choices and error agglomeration, and introduces novel fairness measures at both population and individual levels.
Contribution
It systematically compares calibration scores, formalizes new grouping and agglomeration methods, and introduces fairness deviation measures with desirable properties.
Findings
Grouping data by input features has advantages over prediction-based grouping.
Axiomatization of fairness deviation measures enables new fairness notions.
Framework unifies and extends existing calibration and fairness scores.
Abstract
Probabilistic predictions can be evaluated through comparisons with observed label frequencies, that is, through the lens of calibration. Recent scholarship on algorithmic fairness has started to look at a growing variety of calibration-based objectives under the name of multi-calibration but has still remained fairly restricted. In this paper, we explore and analyse forms of evaluation through calibration by making explicit the choices involved in designing calibration scores. We organise these into three grouping choices and a choice concerning the agglomeration of group errors. This provides a framework for comparing previously proposed calibration scores and helps to formulate novel ones with desirable mathematical properties. In particular, we explore the possibility of grouping datapoints based on their input features rather than on predictions and formally demonstrate advantages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Statistical and Computational Modeling
