How do Large Language Models Handle Multilingualism?

Yiran Zhao; Wenxuan Zhang; Guizhen Chen; Kenji Kawaguchi; Lidong Bing

arXiv:2402.18815·cs.CL·November 12, 2024·3 cites

How do Large Language Models Handle Multilingualism?

Yiran Zhao, Wenxuan Zhang, Guizhen Chen, Kenji Kawaguchi, Lidong Bing

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates how large language models process multiple languages, proposing a multilingual workflow model and a neuron detection method to improve and understand their multilingual capabilities.

Contribution

It introduces the $ exttt{MWork}$ model of multilingual processing and the $ exttt{PLND}$ method for neuron identification without labeled data, enabling targeted fine-tuning.

Findings

01

Validation of $ exttt{MWork}$ through neuron deactivation experiments

02

Fine-tuning language-specific neurons improves multilingual performance

03

Achieved average improvements of 3.6% and 2.3% for high- and low-resource languages

Abstract

Large language models (LLMs) have demonstrated impressive capabilities across diverse languages. This study explores how LLMs handle multilingualism. Based on observed language ratio shifts among layers and the relationships between network structures and certain capabilities, we hypothesize the LLM's multilingual workflow ( $MWork$ ): LLMs initially understand the query, converting multilingual inputs into English for task-solving. In the intermediate layers, they employ English for thinking and incorporate multilingual knowledge with self-attention and feed-forward structures, respectively. In the final layers, LLMs generate responses aligned with the original language of the query. To verify $MWork$ , we introduce Parallel Language-specific Neuron Detection ( $PLND$ ) to identify activated neurons for inputs in different languages without any labeled data. Using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

damo-nlp-sg/multilingual_analysis
jaxOfficial

Videos

How do Large Language Models Handle Multilingualism?· slideslive

Taxonomy

TopicsNatural Language Processing Techniques

MethodsALIGN