Lessons from a Chimp: AI "Scheming" and the Quest for Ape Language
Christopher Summerfield, Lennart Luettgau, Magda Dubois, Hannah Rose Kirk, Kobi Hackenburg, Catherine Fist, Katarina Slama, Nicola Ding, Rebecca Anselmetti, Andrew Strait, Mario Giulianelli, Cozmin Ududec

TL;DR
This paper critically examines the parallels between AI 'scheming' research and historical primate language studies, emphasizing the need for rigorous methodology and theoretical clarity to avoid past pitfalls.
Contribution
It draws lessons from 1970s primate language research to improve current AI scheming investigations, advocating for better scientific rigor and theoretical frameworks.
Findings
Historical primate language research was often anecdotal and trait-attributing.
Current AI scheming research risks similar pitfalls without proper rigor.
Concrete steps are proposed to improve research methodology.
Abstract
We examine recent research that asks whether current AI systems may be developing a capacity for "scheming" (covertly and strategically pursuing misaligned goals). We compare current research practices in this field to those adopted in the 1970s to test whether non-human primates could master natural language. We argue that there are lessons to be learned from that historical research endeavour, which was characterised by an overattribution of human traits to other agents, an excessive reliance on anecdote and descriptive analysis, and a failure to articulate a strong theoretical framework for the research. We recommend that research into AI scheming actively seeks to avoid these pitfalls. We outline some concrete steps that can be taken for this research programme to advance in a productive and scientifically rigorous fashion.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
