How to Measure Human-AI Prediction Accuracy in Explainable AI Systems

Sujay Koujalgi; Andrew Anderson; Iyadunni Adenuga; Shikha Soneji,; Rupika Dikkala; Teresita Guzman Nader; Leo Soccio; Sourav Panda; Rupak Kumar; Das; Margaret Burnett; Jonathan Dodge

arXiv:2409.00069·cs.HC·September 4, 2024

How to Measure Human-AI Prediction Accuracy in Explainable AI Systems

Sujay Koujalgi, Andrew Anderson, Iyadunni Adenuga, Shikha Soneji,, Rupika Dikkala, Teresita Guzman Nader, Leo Soccio, Sourav Panda, Rupak Kumar, Das, Margaret Burnett, Jonathan Dodge

PDF

Open Access

TL;DR

This paper proposes new mathematical methods to measure partial wrongness in human predictions of AI actions, enabling more nuanced evaluation of explainable AI systems especially in large output spaces.

Contribution

It introduces three mathematical bases for quantifying partial wrongness and demonstrates their application through two decision-making domain analyses.

Findings

01

Effective measurement of partial wrongness in large output spaces.

02

Improved rigor in user studies of explainable AI predictions.

03

Validated methods through in-lab and re-analysis studies.

Abstract

Assessing an AI system's behavior-particularly in Explainable AI Systems-is sometimes done empirically, by measuring people's abilities to predict the agent's next move-but how to perform such measurements? In empirical studies with humans, an obvious approach is to frame the task as binary (i.e., prediction is either right or wrong), but this does not scale. As output spaces increase, so do floor effects, because the ratio of right answers to wrong answers quickly becomes very small. The crux of the problem is that the binary framing is failing to capture the nuances of the different degrees of "wrongness." To address this, we begin by proposing three mathematical bases upon which to measure "partial wrongness." We then uses these bases to perform two analyses on sequential decision-making domains: the first is an in-lab study with 86 participants on a size-36 action space; the second…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI)