RCTs & Human Uplift Studies: Methodological Challenges and Practical Solutions for Frontier AI Evaluation

Patricia Paskov; Kevin Wei; Shen Zhou Hong; Dan Bateyko; Xavier Roberts-Gaal; Carson Ezell; Gailius Praninskas; Valerie Chen; Umang Bhatt; Ella Guest

arXiv:2603.11001·cs.CY·March 12, 2026

RCTs & Human Uplift Studies: Methodological Challenges and Practical Solutions for Frontier AI Evaluation

Patricia Paskov, Kevin Wei, Shen Zhou Hong, Dan Bateyko, Xavier Roberts-Gaal, Carson Ezell, Gailius Praninskas, Valerie Chen, Umang Bhatt, Ella Guest

PDF

Open Access

TL;DR

This paper examines the methodological challenges of using randomized controlled trials to evaluate frontier AI systems' effects on humans, highlighting practical solutions and limitations for high-stakes decision-making.

Contribution

It provides an empirical analysis of expert practitioners' experiences, identifying key challenges and solutions in applying RCTs to frontier AI evaluation.

Findings

01

Evolving AI systems complicate causal inference assumptions.

02

Heterogeneous user proficiency affects study validity.

03

Practitioners report specific solutions for methodological challenges.

Abstract

Human uplift studies - or studies that measure AI effects on human performance relative to a status quo, typically using randomized controlled trial (RCT) methodology - are increasingly used to inform deployment, governance, and safety decisions for frontier AI systems. While the methods underlying these studies are well-established, their interaction with the distinctive properties of frontier AI systems remains underexamined, particularly when results are used to inform high-stakes decisions. We present findings from interviews with 16 expert practitioners with experience conducting human uplift studies in domains including biosecurity, cybersecurity, education, and labor. Across interviews, experts described a recurring tension between standard causal inference assumptions and the object of study itself. Rapidly evolving AI systems, shifting baselines, heterogeneous and changing user…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education · Advanced Causal Inference Techniques