Beyond Behavior: Why AI Evaluation Needs a Cognitive Revolution
Amir Konigsberg

TL;DR
This paper advocates for a cognitive revolution in AI evaluation, emphasizing the need to move beyond behavioral tests to understand internal processes and mechanisms of intelligent systems.
Contribution
It highlights the limitations of behavioral evaluation in AI and proposes an epistemological shift towards understanding internal system processes.
Findings
Behavioral evaluation constrains questions about internal mechanisms.
A cognitive revolution in AI evaluation is necessary.
Current methods overlook differences in computational processes.
Abstract
In 1950, Alan Turing proposed replacing the question "Can machines think?" with a behavioral test: if a machine's outputs are indistinguishable from those of a thinking being, the question of whether it truly thinks can be set aside. This paper argues that Turing's move was not only a pragmatic simplification but also an epistemological commitment, a decision about what kind of evidence counts as relevant to intelligence attribution, and that this commitment has quietly constrained AI research for seven decades. We trace how Turing's behavioral epistemology became embedded in the field's evaluative infrastructure, rendering unaskable a class of questions about process, mechanism, and internal organization that cognitive psychology, neuroscience, and related disciplines learned to ask. We draw a structural parallel to the behaviorist-to-cognitivist transition in psychology: just as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
