On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons
Takeshi Kojima, Itsuki Okimura, Yusuke Iwasawa, Hitomi Yanaka, Yutaka, Matsuo

TL;DR
This paper investigates how decoder-based multilingual pre-trained language models internally represent different languages, identifying language-specific neurons mainly in early and late layers, and shows that altering these neurons significantly impacts language generation.
Contribution
The study reveals the existence and distribution of language-specific neurons in decoder-based PLMs and demonstrates their critical role in language generation through targeted neuron tampering.
Findings
Language-specific neurons are mostly in early and late layers.
Less than 5% neuron overlap between languages.
Tampering with language-specific neurons alters language output significantly.
Abstract
Current decoder-based pre-trained language models (PLMs) successfully demonstrate multilingual capabilities. However, it is unclear how these models handle multilingualism. We analyze the neuron-level internal behavior of multilingual decoder-based PLMs, Specifically examining the existence of neurons that fire ``uniquely for each language'' within decoder-only multilingual PLMs. We analyze six languages: English, German, French, Spanish, Chinese, and Japanese, and show that language-specific neurons are unique, with a slight overlap (< 5%) between languages. These neurons are mainly distributed in the models' first and last few layers. This trend remains consistent across languages and models. Additionally, we tamper with less than 1% of the total neurons in each model during inference and demonstrate that tampering with a few language-specific neurons drastically changes the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling
