On the Utility of Prediction Sets in Human-AI Teams

Varun Babbar; Umang Bhatt; Adrian Weller

arXiv:2205.01411·cs.AI·May 27, 2022

On the Utility of Prediction Sets in Human-AI Teams

Varun Babbar, Umang Bhatt, Adrian Weller

PDF

TL;DR

This paper investigates how conformal prediction sets influence human-AI team decision-making, introduces D-CP to reduce overly large sets, and demonstrates improved team performance and usefulness through experiments.

Contribution

It introduces D-CP, a novel method that combines conformal prediction with expert deferment to produce more useful and manageable prediction sets in human-AI collaboration.

Findings

01

Prediction sets improve team performance over single predictions.

02

D-CP reduces prediction set size and increases usefulness.

03

Experts prefer D-CP over standard conformal prediction sets.

Abstract

Research on human-AI teams usually provides experts with a single label, which ignores the uncertainty in a model's recommendation. Conformal prediction (CP) is a well established line of research that focuses on building a theoretically grounded, calibrated prediction set, which may contain multiple labels. We explore how such prediction sets impact expert decision-making in human-AI teams. Our evaluation on human subjects finds that set valued predictions positively impact experts. However, we notice that the predictive sets provided by CP can be very large, which leads to unhelpful AI assistants. To mitigate this, we introduce D-CP, a method to perform CP on some examples and defer to experts. We prove that D-CP can reduce the prediction set size of non-deferred examples. We show how D-CP performs in quantitative and in human subject experiments ( $n = 120$ ). Our results suggest that CP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.