Rashomon Capacity: A Metric for Predictive Multiplicity in Classification
Hsiang Hsu, Flavio du Pin Calmon

TL;DR
The paper introduces Rashomon Capacity, a new metric for measuring predictive multiplicity in probabilistic classifiers, helping to identify conflicting predictions among models with similar performance.
Contribution
It proposes Rashomon Capacity as a novel, rigorous metric for predictive multiplicity applicable to probabilistic classifiers, with practical estimation methods and implications for model transparency.
Findings
Rashomon Capacity effectively captures predictive multiplicity across datasets.
The metric provides a principled way to disclose conflicting models to stakeholders.
Experiments show its applicability to neural networks and various models.
Abstract
Predictive multiplicity occurs when classification models with statistically indistinguishable performances assign conflicting predictions to individual samples. When used for decision-making in applications of consequence (e.g., lending, education, criminal justice), models developed without regard for predictive multiplicity may result in unjustified and arbitrary decisions for specific individuals. We introduce a new metric, called Rashomon Capacity, to measure predictive multiplicity in probabilistic classification. Prior metrics for predictive multiplicity focus on classifiers that output thresholded (i.e., 0-1) predicted classes. In contrast, Rashomon Capacity applies to probabilistic classifiers, capturing more nuanced score variations for individual samples. We provide a rigorous derivation for Rashomon Capacity, argue its intuitive appeal, and demonstrate how to estimate it in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Bayesian Modeling and Causal Inference
