Stroke Lesions as a Rosetta Stone for Language Model Interpretability

Julius Fridriksson (1,2); Roger D. Newman-Norlund (1,2); Saeed Ahmadi (1); Regan Willis (3); Nadra Salman (4); Kalil Warren (4); Xiang Guan (3); Yong Yang (3); Srihari Nelakuditi (3); Rutvik Desai (5); Leonardo Bonilha (6); Jeff Charney (2,7); Chris Rorden (5) ((1) University of South Carolina; (2) ALLT.AI; LLC; (3) University of South Carolina; Department of Computer Science; Engineering; (4) University of South Carolina; Linguistics Program; (5) Department of Psychology; University of South Carolina; (6) Department of Neurology; USC School of Medicine; (7) MKHSTRY; LLC)

arXiv:2602.04074·cs.LG·February 5, 2026

Stroke Lesions as a Rosetta Stone for Language Model Interpretability

Julius Fridriksson (1,2), Roger D. Newman-Norlund (1,2), Saeed Ahmadi (1), Regan Willis (3), Nadra Salman (4), Kalil Warren (4), Xiang Guan (3), Yong Yang (3), Srihari Nelakuditi (3), Rutvik Desai (5), Leonardo Bonilha (6), Jeff Charney (2,7)

PDF

Open Access

TL;DR

This paper introduces BLUM, a framework that uses lesion-symptom mapping from stroke patients to externally validate and interpret the internal components of large language models, bridging neuroscience and AI.

Contribution

The study presents a novel approach that leverages human brain lesion data to evaluate and interpret LLM perturbations, providing external validation for model interpretability.

Findings

01

LLM error profiles align with human lesion patterns in 67-68% of cases.

02

Semantic errors map onto ventral-stream lesions; phonemic errors onto dorsal-stream.

03

External validation using stroke data enhances understanding of LLM internal mechanisms.

Abstract

Large language models (LLMs) have achieved remarkable capabilities, yet methods to verify which model components are truly necessary for language function remain limited. Current interpretability approaches rely on internal metrics and lack external validation. Here we present the Brain-LLM Unified Model (BLUM), a framework that leverages lesion-symptom mapping, the gold standard for establishing causal brain-behavior relationships for over a century, as an external reference structure for evaluating LLM perturbation effects. Using data from individuals with chronic post-stroke aphasia (N = 410), we trained symptom-to-lesion models that predict brain damage location from behavioral error profiles, applied systematic perturbations to transformer layers, administered identical clinical assessments to perturbed LLMs and human patients, and projected LLM error profiles into human lesion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeurobiology of Language and Bilingualism · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare