Language-Specific Neurons: The Key to Multilingual Capabilities in Large   Language Models

Tianyi Tang; Wenyang Luo; Haoyang Huang; Dongdong Zhang; Xiaolei Wang,; Xin Zhao; Furu Wei; Ji-Rong Wen

arXiv:2402.16438·cs.CL·June 7, 2024·1 cites

Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

Tianyi Tang, Wenyang Luo, Haoyang Huang, Dongdong Zhang, Xiaolei Wang,, Xin Zhao, Furu Wei, Ji-Rong Wen

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper identifies language-specific neurons in large language models using a novel detection method, revealing that a small subset of neurons in specific layers drive multilingual processing and can be manipulated to steer output language.

Contribution

The paper introduces LAPE, a new method to detect language-specific neurons, and demonstrates their role in multilingual capabilities and potential for controlling model output.

Findings

01

Language-specific neurons are mainly in top and bottom layers.

02

A small subset of neurons accounts for multilingual proficiency.

03

Selective activation/deactivation can steer language output.

Abstract

Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora. It remains a challenging problem to explain the underlying mechanisms by which LLMs process multilingual texts. In this paper, we delve into the composition of Transformer architectures in LLMs to pinpoint language-specific regions. Specially, we propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs. Based on LAPE, we conduct comprehensive experiments on several representative LLMs, such as LLaMA-2, BLOOM, and Mistral. Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons, primarily situated in the models' top and bottom layers. Furthermore, we showcase the feasibility to "steer"…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rucaibox/language-specific-neurons
pytorchOfficial

Videos

Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Dropout · Dense Connections · Label Smoothing · Adam · Softmax · Layer Normalization · BLOOM