LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Jun Zhao, Zhihao Zhang, Luhui Gao, Qi Zhang, Tao Gui, Xuanjing Huang

TL;DR
This paper empirically investigates how to transfer LLaMA's language capabilities to non-English languages, analyzing factors like vocabulary extension and instruction tuning, and demonstrating effective transfer with minimal pretraining data across multiple languages.
Contribution
It provides a comprehensive empirical study on transferring LLaMA's capabilities to non-English languages, highlighting effective methods and minimal data requirements.
Findings
Comparable performance achieved with less than 1% pretraining data
Effective transfer demonstrated across 13 low-resource languages
Evaluation shows maintained response quality and knowledge alignment
Abstract
In recent times, substantial advancements have been witnessed in large language models (LLMs), exemplified by ChatGPT, showcasing remarkable proficiency across a range of complex tasks. However, many mainstream LLMs (e.g. LLaMA) are pretrained on English-dominant corpus, which limits their performance in other non-English languages. In this paper, we focus on how to effectively transfer the capabilities of language generation and following instructions to a non-English language. To answer this question, we conduct an extensive empirical investigation based on LLaMA, accumulating over 1440 GPU hours. We analyze the impact of key factors such as vocabulary extension, further pretraining, and instruction tuning on transfer. To accurately assess the model's level of knowledge, we employ four widely used standardized testing benchmarks: C-Eval, MMLU, AGI-Eval, and GAOKAO-Bench. Furthermore,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsFocus
