Do Language Models Align with Brains? Prediction Scores Are Not Enough
Xiao Jia

TL;DR
This study critically evaluates whether current language models truly align with brain activity, revealing that apparent neural similarities are often explained by controls rather than genuine alignment.
Contribution
Introduces L-PACT, a rigorous framework that assesses brain-model alignment using multiple controls, revealing that previous positive signals are mostly control-explained.
Findings
No language model representations passed the strict alignment gates.
Apparent positive evidence is explained by controls rather than true alignment.
L-PACT provides a more reliable assessment of brain-model correspondence.
Abstract
Brain-language model comparisons often interpret neural prediction scores as evidence that model representations capture brain-relevant language computation. We asked whether language models align with brains, and whether prediction scores are enough to support that claim, using L-PACT, a source-audited framework that evaluates predictive, relational, mechanism-stripping, and reliability-bounded evidence. Across primary naturalistic language neural datasets and derived language-model representations, L-PACT compared real model features with nuisance baselines and severe controls, tested whether model-to-brain profiles reproduced brain-to-brain patterns, recomputed held-out scores after mechanism stripping, and normalized evidence against brain-brain ceilings. The locked analysis set contains 414 predictive-control rows, 2304 relational profile rows, 4320 mechanism-stripping rows, 420…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
