Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities
Kazuki Fujii, Taishi Nakamura, Mengsay Loem, Hiroki Iida, Masanari, Ohi, Kakeru Hattori, Hirai Shota, Sakae Mizuki, Rio Yokota, Naoaki Okazaki

TL;DR
This paper presents Swallow, a Japanese-enhanced LLM created through cross-lingual continual pre-training of Llama 2, demonstrating significant improvements in Japanese language tasks and insights into effective training methodologies.
Contribution
Introduces Swallow, a Japanese language model built via vocabulary expansion and continual pre-training, with analysis of methods improving cross-lingual adaptation.
Findings
Performance on Japanese tasks improved significantly.
Monotonic performance increase with more training data up to 100B tokens.
Vocabulary expansion and parallel corpora enhance translation ability.
Abstract
Cross-lingual continual pre-training of large language models (LLMs) initially trained on English corpus allows us to leverage the vast amount of English language resources and reduce the pre-training cost. In this study, we constructed Swallow, an LLM with enhanced Japanese capability, by extending the vocabulary of Llama 2 to include Japanese characters and conducting continual pre-training on a large Japanese web corpus. Experimental results confirmed that the performance on Japanese tasks drastically improved through continual pre-training, and the performance monotonically increased with the amount of training data up to 100B tokens. Consequently, Swallow achieved superior performance compared to other LLMs that were trained from scratch in English and Japanese. An analysis of the effects of continual pre-training revealed that it was particularly effective for Japanese question…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗tokyotech-llm/Swallow-13b-hfmodel· 26 dl· ♡ 1226 dl♡ 12
- 🤗tokyotech-llm/Swallow-70b-hfmodel· 26 dl· ♡ 1026 dl♡ 10
- 🤗tokyotech-llm/Swallow-7b-hfmodel· 351 dl· ♡ 17351 dl♡ 17
- 🤗tokyotech-llm/Swallow-7b-NVE-hfmodel· 17 dl· ♡ 217 dl♡ 2
- 🤗tokyotech-llm/Swallow-7b-NVE-instruct-hfmodel· 28 dl· ♡ 328 dl♡ 3
- 🤗tokyotech-llm/Swallow-7b-instruct-hfmodel· 165 dl· ♡ 44165 dl♡ 44
- 🤗tokyotech-llm/Swallow-13b-instruct-hfmodel· 24 dl· ♡ 1824 dl♡ 18
- 🤗tokyotech-llm/Swallow-70b-NVE-hfmodel· 28 dl· ♡ 128 dl♡ 1
- 🤗tokyotech-llm/Swallow-70b-instruct-hfmodel· 794 dl· ♡ 37794 dl♡ 37
- 🤗tokyotech-llm/Swallow-70b-NVE-instruct-hfmodel· 17 dl· ♡ 217 dl♡ 2
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Second Language Acquisition and Learning
MethodsLLaMA
