Unjustified Sample Sizes and Generalizations in Explainable AI Research: Principles for More Inclusive User Studies
Uwe Peters, Mary Carman

TL;DR
This paper highlights the lack of justified sample sizes in XAI user studies and how it hampers ethical and generalizable conclusions, proposing principles for more inclusive research practices.
Contribution
It provides an analysis of 220 XAI user studies revealing methodological issues and offers principles to improve inclusivity and validity in future research.
Findings
Most studies lacked sample size rationales
Many generalized beyond their target populations
No correlation between sample size and broader conclusions
Abstract
Many ethical frameworks require artificial intelligence (AI) systems to be explainable. Explainable AI (XAI) models are frequently tested for their adequacy in user studies. Since different people may have different explanatory needs, it is important that participant samples in user studies are large enough to represent the target population to enable generalizations. However, it is unclear to what extent XAI researchers reflect on and justify their sample sizes or avoid broad generalizations across people. We analyzed XAI user studies (n = 220) published between 2012 and 2022. Most studies did not offer rationales for their sample sizes. Moreover, most papers generalized their conclusions beyond their target population, and there was no evidence that broader conclusions in quantitative studies were correlated with larger samples. These methodological problems can impede evaluations of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Ethics and Social Impacts of AI
