Far from the Shallow: Brain-Predictive Reasoning Embedding through Residual Disentanglement
Linyang He, Tianjun Zhong, Richard Antonello, Gavin Mischler, Micah Goldblum, Nima Mesgarani

TL;DR
This paper introduces a residual disentanglement method to isolate reasoning-related neural representations from language models, revealing their unique neural signatures and hierarchical processing in the brain.
Contribution
The study presents a novel residual disentanglement technique that separates reasoning from other linguistic features in language model embeddings, enabling better neural decoding.
Findings
Reasoning embeddings predict neural activity beyond other features.
Neural reasoning signals peak later (~350-400ms) indicating hierarchical processing.
Standard embeddings are biased towards shallow linguistic features.
Abstract
Understanding how the human brain progresses from processing simple linguistic inputs to performing high-level reasoning is a fundamental challenge in neuroscience. While modern large language models (LLMs) are increasingly used to model neural responses to language, their internal representations are highly "entangled," mixing information about lexicon, syntax, meaning, and reasoning. This entanglement biases conventional brain encoding analyses toward linguistically shallow features (e.g., lexicon and syntax), making it difficult to isolate the neural substrates of cognitively deeper processes. Here, we introduce a residual disentanglement method that computationally isolates these components. By first probing an LM to identify feature-specific layers, our method iteratively regresses out lower-level representations to produce four nearly orthogonal embeddings for lexicon, syntax,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
