Optimizing Low-Resource Language Model Training: Comprehensive Analysis of Multi-Epoch, Multi-Lingual, and Two-Stage Approaches
Kosuke Akimoto, Masafumi Oyamada

TL;DR
This paper systematically analyzes hyperparameter configurations for low-resource language LLM training, revealing optimal strategies for different data scenarios and providing practical guidelines for efficient model development.
Contribution
It offers a comprehensive exploration of multi-epoch, multi-lingual, and two-stage training approaches, identifying optimal setups and stable model scales for low-resource languages.
Findings
Optimal training approach shifts with target data amount
Stable optimal model scale regardless of data size
Power law relationship for validation loss in single-stage training
Abstract
In this paper, we address the challenge of optimizing training setups for Large Language Models (LLMs) of low-resource language with a limited amount of corpus. Existing works adopt multi-epoch, multi-lingual, and two-stage training to utilize the limited target language corpus efficiently. However, there is still a lack of understanding about the optimal hyperparameter setups for combining these three approaches to train LLMs. We exhaustively explore training setups for low-resource language LLM, combining these three approaches, and found the following insights for efficiently reducing the cost of hyperparameter search: (1) As the amount of target language corpus decreases, the optimal training approach shifts from monolingual single-stage training to multi-lingual two-stage training at a compute budget dependent threshold. (2) The optimal model scale remains stable regardless of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
