Brain-Language Model Alignment: Insights into the Platonic Hypothesis and Intermediate-Layer Advantage
\'Angela L\'opez-Cardona, Sebasti\'an Idesis, Mireia Masias-Bruns, Sergi Abadal, Ioannis Arapakis

TL;DR
This paper reviews recent fMRI studies to explore whether brain and language model representations align, supporting the hypotheses that models converge to real-world representations and that intermediate layers encode more general features.
Contribution
It provides a comprehensive review of recent neural and model alignment studies, testing the Platonic and Intermediate-Layer hypotheses.
Findings
Evidence of shared abstract representations between brains and models
Support for the convergence of models toward real-world representations
Intermediate layers encode richer, more generalizable features
Abstract
Do brains and language models converge toward the same internal representations of the world? Recent years have seen a rise in studies of neural activations and model alignment. In this work, we review 25 fMRI-based studies published between 2023 and 2025 and explicitly confront their findings with two key hypotheses: (i) the Platonic Representation Hypothesis -- that as models scale and improve, they converge to a representation of the real world, and (ii) the Intermediate-Layer Advantage -- that intermediate (mid-depth) layers often encode richer, more generalizable features. Our findings provide converging evidence that models and brains may share abstract representational structures, supporting both hypotheses and motivating further research on brain-model alignment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Action Observation and Synchronization · Embodied and Extended Cognition
