Unveiling Language Competence Neurons: A Psycholinguistic Approach to Model Interpretability
Xufeng Duan, Xinyu Zhou, Bei Xiao, Zhenguang G. Cai

TL;DR
This paper uses psycholinguistic paradigms to analyze neuron-level representations in GPT-2-XL, revealing how specific neurons correspond to linguistic abilities and advancing interpretability of language models.
Contribution
It introduces a novel psycholinguistic approach to probe neuron-level language competence in large language models, linking neuron activity to linguistic abilities.
Findings
GPT-2-XL struggles with sound-shape tasks
GPT-2-XL shows human-like sound-gender and causality abilities
Neuron ablation links specific neurons to linguistic skills
Abstract
As large language models (LLMs) advance in their linguistic capacity, understanding how they capture aspects of language competence remains a significant challenge. This study therefore employs psycholinguistic paradigms in English, which are well-suited for probing deeper cognitive aspects of language processing, to explore neuron-level representations in language model across three tasks: sound-shape association, sound-gender association, and implicit causality. Our findings indicate that while GPT-2-XL struggles with the sound-shape task, it demonstrates human-like abilities in both sound-gender association and implicit causality. Targeted neuron ablation and activation manipulation reveal a crucial relationship: When GPT-2-XL displays a linguistic ability, specific neurons correspond to that competence; conversely, the absence of such an ability indicates a lack of specialized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterpreting and Communication in Healthcare · Natural Language Processing Techniques
