Artificial Aphasias in Lesioned Language Models
Nathan Roll, Jill Kries, Laura Gwilliams, Cory Shain

TL;DR
This paper introduces an aphasia-inspired method to analyze language models by lesioning parameters and comparing resulting symptoms to human aphasia profiles, revealing differences in functional organization.
Contribution
It presents a novel lesion-based technique to characterize language model organization and compares emergent symptoms with human aphasia, highlighting model-specific differences.
Findings
Lesions in different model components produce distinct symptom profiles.
Early layer lesions cause syntactic and semantic deficits, late layers affect phonological and fluency.
Model lesion symptoms differ qualitatively from human aphasia patterns.
Abstract
Aphasias, selective language impairments which can arise from brain damage, reveal the functional organization of human language by providing causal links between affected brain regions and specific symptom profiles. Drawing on this literature, we introduce an aphasia-inspired technique to characterize the emergent functional organization of language models (LMs). We ``lesion'' (zero-out) model parameters and measure the effects of this intervention against clinical aphasia symptoms, as diagnosed by the Text Aphasia Battery (TAB). When applied to 112,426 outputs from five 1B-scale LMs, the full range of evaluated symptoms surface, but in distributions largely distinct from those of humans. Our method uncovers broad symptom-profile differences between attention components (query, key, value, output) and feed-forward components (up, gate, down), with weaker evidence for differences among…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
