Predictive Multiplicity in Classification

Charles T. Marx; Flavio du Pin Calmon; Berk Ustun

arXiv:1909.06677·cs.LG·September 17, 2020·34 cites

Predictive Multiplicity in Classification

Charles T. Marx, Flavio du Pin Calmon, Berk Ustun

PDF

Open Access 3 Repos 1 Video

TL;DR

This paper defines predictive multiplicity in classification, introduces measures to evaluate it, and provides tools to compute it exactly, revealing that real-world datasets often admit highly conflicting models.

Contribution

It formalizes the concept of predictive multiplicity, develops integer programming methods to measure it, and demonstrates its significance in real-world recidivism prediction datasets.

Findings

01

Real-world datasets exhibit high predictive multiplicity.

02

Competing models can assign conflicting predictions.

03

Tools enable exact measurement of multiplicity in linear classifiers.

Abstract

Prediction problems often admit competing models that perform almost equally well. This effect challenges key assumptions in machine learning when competing models assign conflicting predictions. In this paper, we define predictive multiplicity as the ability of a prediction problem to admit competing models with conflicting predictions. We introduce formal measures to evaluate the severity of predictive multiplicity and develop integer programming tools to compute them exactly for linear classification problems. We apply our tools to measure predictive multiplicity in recidivism prediction problems. Our results show that real-world datasets may admit competing models that assign wildly conflicting predictions, and motivate the need to measure and report predictive multiplicity in model development.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Predictive Multiplicity in Classification· slideslive

Taxonomy

TopicsBayesian Modeling and Causal Inference · Imbalanced Data Classification Techniques · Computability, Logic, AI Algorithms