Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models
Emily Cheng, Richard J. Antonello

TL;DR
This paper provides evidence from fMRI data supporting a two-phase abstraction process in language models, revealing how intermediate representations evolve and relate to brain responses, with implications for understanding model interpretability.
Contribution
It uncovers a two-phase abstraction process in LLMs supported by fMRI data and shows how this process develops and compresses during training.
Findings
Intermediate hidden states predict brain responses better than output layers.
The abstraction process naturally arises during training and becomes more compressed.
Layerwise encoding performance correlates with the intrinsic dimensionality of representations.
Abstract
Research has repeatedly demonstrated that intermediate hidden states extracted from large language models are able to predict measured brain response to natural language stimuli. Yet, very little is known about the representation properties that enable this high prediction performance. Why is it the intermediate layers, and not the output layers, that are most capable for this unique and highly general transfer task? In this work, we show that evidence from language encoding models in fMRI supports the existence of a two-phase abstraction process within LLMs. We use manifold learning methods to show that this abstraction process naturally arises over the course of training a language model and that the first "composition" phase of this abstraction process is compressed into fewer layers as training continues. Finally, we demonstrate a strong correspondence between layerwise encoding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Neurobiology of Language and Bilingualism
