How do Large Language Models Handle Multilingualism?
Yiran Zhao, Wenxuan Zhang, Guizhen Chen, Kenji Kawaguchi, Lidong Bing

TL;DR
This paper investigates how large language models process multiple languages, proposing a multilingual workflow model and a neuron detection method to improve and understand their multilingual capabilities.
Contribution
It introduces the $ exttt{MWork}$ model of multilingual processing and the $ exttt{PLND}$ method for neuron identification without labeled data, enabling targeted fine-tuning.
Findings
Validation of $ exttt{MWork}$ through neuron deactivation experiments
Fine-tuning language-specific neurons improves multilingual performance
Achieved average improvements of 3.6% and 2.3% for high- and low-resource languages
Abstract
Large language models (LLMs) have demonstrated impressive capabilities across diverse languages. This study explores how LLMs handle multilingualism. Based on observed language ratio shifts among layers and the relationships between network structures and certain capabilities, we hypothesize the LLM's multilingual workflow (): LLMs initially understand the query, converting multilingual inputs into English for task-solving. In the intermediate layers, they employ English for thinking and incorporate multilingual knowledge with self-attention and feed-forward structures, respectively. In the final layers, LLMs generate responses aligned with the original language of the query. To verify , we introduce Parallel Language-specific Neuron Detection () to identify activated neurons for inputs in different languages without any labeled data. Using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsALIGN
