Abstraction Induces the Brain Alignment of Language and Speech Models
Emily Cheng, Aditya R. Vaidya, Richard Antonello

TL;DR
This study shows that the similarity between language, speech models, and brain responses is driven by shared semantic abstraction in their middle layers, not just next-word prediction.
Contribution
It provides evidence that semantic richness and intrinsic dimension in model layers underpin their alignment with brain activity, emphasizing the role of meaning abstraction.
Findings
Higher-order linguistic features peak in middle layers.
Intrinsic dimension predicts brain signal explainability.
Finetuning increases intrinsic dimension and semantic content.
Abstract
Research has repeatedly demonstrated that intermediate hidden states extracted from large language models and speech audio models predict measured brain response to natural language stimuli. Yet, very little is known about the representation properties that enable this high prediction performance. Why is it the intermediate layers, and not the output layers, that are most effective for this unique and highly general transfer task? We give evidence that the correspondence between speech and language models and the brain derives from shared meaning abstraction and not their next-word prediction properties. In particular, models construct higher-order linguistic features in their middle layers, cued by a peak in the layerwise intrinsic dimension, a measure of feature complexity. We show that a layer's intrinsic dimension strongly predicts how well it explains fMRI and ECoG signals; that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
