Unveiling Language Competence Neurons: A Psycholinguistic Approach to   Model Interpretability

Xufeng Duan; Xinyu Zhou; Bei Xiao; Zhenguang G. Cai

arXiv:2409.15827·cs.CL·December 12, 2024

Unveiling Language Competence Neurons: A Psycholinguistic Approach to Model Interpretability

Xufeng Duan, Xinyu Zhou, Bei Xiao, Zhenguang G. Cai

PDF

Open Access

TL;DR

This paper uses psycholinguistic paradigms to analyze neuron-level representations in GPT-2-XL, revealing how specific neurons correspond to linguistic abilities and advancing interpretability of language models.

Contribution

It introduces a novel psycholinguistic approach to probe neuron-level language competence in large language models, linking neuron activity to linguistic abilities.

Findings

01

GPT-2-XL struggles with sound-shape tasks

02

GPT-2-XL shows human-like sound-gender and causality abilities

03

Neuron ablation links specific neurons to linguistic skills

Abstract

As large language models (LLMs) advance in their linguistic capacity, understanding how they capture aspects of language competence remains a significant challenge. This study therefore employs psycholinguistic paradigms in English, which are well-suited for probing deeper cognitive aspects of language processing, to explore neuron-level representations in language model across three tasks: sound-shape association, sound-gender association, and implicit causality. Our findings indicate that while GPT-2-XL struggles with the sound-shape task, it demonstrates human-like abilities in both sound-gender association and implicit causality. Targeted neuron ablation and activation manipulation reveal a crucial relationship: When GPT-2-XL displays a linguistic ability, specific neurons correspond to that competence; conversely, the absence of such an ability indicates a lack of specialized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInterpreting and Communication in Healthcare · Natural Language Processing Techniques