The Arrival of AGI? When Expert Personas Exceed Expert Benchmarks
Drake Mullens, Stella Shen

TL;DR
This study reveals that expert personas do not improve language model performance in standard benchmarks due to structural limitations, but controlled experiments show they can achieve near-ceiling accuracy when properly evaluated.
Contribution
The paper identifies key methodological limitations in previous persona research and demonstrates how controlled trials can uncover true expert reasoning capabilities in language models.
Findings
Expert personas achieve ceiling accuracy on hard questions with valid answers.
Baseline contamination and format constraints hinder detection of true performance.
Proper evaluation reveals expert personas' potential when measurement issues are addressed.
Abstract
Do expert personas improve language model performance? The Wharton Generative AI Lab reports that they do not, broadcasting to millions via social media the recommendation that practitioners abandon a technique recommended by Anthropic, Google, and OpenAI. We demonstrate that this null finding was structurally predictable. Five core mechanisms precluded detection before data collection began: baseline contamination elevating the starting point to near-ceiling, system prompt hierarchy subordinating experimental manipulation, impossible expert specifications collapsing to generic competence, format constraints suppressing reasoning processes, and provider exclusion limiting generalizability. Controlled trials correcting these limitations reveal what the original design obscured. To test this, we selected the GPQA Diamond hardest questions to prevent baseline pattern matching, forcing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersona Design and Applications · AI in Service Interactions · Ethics and Social Impacts of AI
