Explanations of Large Language Models Explain Language Representations in the Brain
Maryam Rahimi, Yadollah Yaghoobzadeh, Mohammad Reza Daliri

TL;DR
This study uses explainable AI techniques to connect large language model representations with brain activity during language comprehension, revealing hierarchical processing stages and the potential for neural validation of AI explanations.
Contribution
It introduces a novel method applying attribution techniques to link LLM internal states with neural data, advancing understanding of language processing in the brain.
Findings
Attribution methods predict brain activity across the language network.
Early LLM layers align with initial language processing stages in the brain.
Later layers correspond to more advanced language processing stages.
Abstract
Large language models (LLMs) not only exhibit human-like performance but also share computational principles with the brain's language processing mechanisms. While prior research has focused on mapping LLMs' internal representations to neural activity, we propose a novel approach using explainable AI (XAI) to strengthen this link. Applying attribution methods, we quantify the influence of preceding words on LLMs' next-word predictions and use these explanations to predict fMRI data from participants listening to narratives. We find that attribution methods robustly predict brain activity across the language network, revealing a hierarchical pattern: explanations from early layers align with the brain's initial language processing stages, while later layers correspond to more advanced stages. Additionally, layers with greater influence on next-word predictionreflected in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Language and cultural evolution
MethodsALIGN
