Predictive Multiplicity in Probabilistic Classification
Jamelle Watson-Daniels, David C. Parkes, Berk Ustun

TL;DR
This paper introduces a framework to measure and analyze the variability in probabilistic classification predictions across near-optimal models, highlighting the importance of considering predictive multiplicity in real-world risk assessments.
Contribution
The paper develops measures and optimization methods to quantify predictive multiplicity in probabilistic classifiers, and analyzes its causes and prevalence in practical applications.
Findings
Predictive multiplicity is common in real-world classification tasks.
Model predictions can vary significantly among near-optimal models.
Data characteristics influence the extent of predictive multiplicity.
Abstract
Machine learning models are often used to inform real world risk assessment tasks: predicting consumer default risk, predicting whether a person suffers from a serious illness, or predicting a person's risk to appear in court. Given multiple models that perform almost equally well for a prediction task, to what extent do predictions vary across these models? If predictions are relatively consistent for similar models, then the standard approach of choosing the model that optimizes a penalized loss suffices. But what if predictions vary significantly for similar models? In machine learning, this is referred to as predictive multiplicity i.e. the prevalence of conflicting predictions assigned by near-optimal competing models. In this paper, we present a framework for measuring predictive multiplicity in probabilistic classification (predicting the probability of a positive outcome). We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Causal Inference Techniques · Bayesian Modeling and Causal Inference · Statistical Methods and Inference
