Explanations of Large Language Models Explain Language Representations   in the Brain

Maryam Rahimi; Yadollah Yaghoobzadeh; Mohammad Reza Daliri

arXiv:2502.14671·cs.CL·April 7, 2025

Explanations of Large Language Models Explain Language Representations in the Brain

Maryam Rahimi, Yadollah Yaghoobzadeh, Mohammad Reza Daliri

PDF

Open Access

TL;DR

This study uses explainable AI techniques to connect large language model representations with brain activity during language comprehension, revealing hierarchical processing stages and the potential for neural validation of AI explanations.

Contribution

It introduces a novel method applying attribution techniques to link LLM internal states with neural data, advancing understanding of language processing in the brain.

Findings

01

Attribution methods predict brain activity across the language network.

02

Early LLM layers align with initial language processing stages in the brain.

03

Later layers correspond to more advanced language processing stages.

Abstract

Large language models (LLMs) not only exhibit human-like performance but also share computational principles with the brain's language processing mechanisms. While prior research has focused on mapping LLMs' internal representations to neural activity, we propose a novel approach using explainable AI (XAI) to strengthen this link. Applying attribution methods, we quantify the influence of preceding words on LLMs' next-word predictions and use these explanations to predict fMRI data from participants listening to narratives. We find that attribution methods robustly predict brain activity across the language network, revealing a hierarchical pattern: explanations from early layers align with the brain's initial language processing stages, while later layers correspond to more advanced stages. Additionally, layers with greater influence on next-word prediction $\unicode x 2014$ reflected in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Language and cultural evolution

MethodsALIGN