Optimizing Low-Resource Language Model Training: Comprehensive Analysis   of Multi-Epoch, Multi-Lingual, and Two-Stage Approaches

Kosuke Akimoto; Masafumi Oyamada

arXiv:2410.12325·cs.CL·October 17, 2024

Optimizing Low-Resource Language Model Training: Comprehensive Analysis of Multi-Epoch, Multi-Lingual, and Two-Stage Approaches

Kosuke Akimoto, Masafumi Oyamada

PDF

Open Access

TL;DR

This paper systematically analyzes hyperparameter configurations for low-resource language LLM training, revealing optimal strategies for different data scenarios and providing practical guidelines for efficient model development.

Contribution

It offers a comprehensive exploration of multi-epoch, multi-lingual, and two-stage training approaches, identifying optimal setups and stable model scales for low-resource languages.

Findings

01

Optimal training approach shifts with target data amount

02

Stable optimal model scale regardless of data size

03

Power law relationship for validation loss in single-stage training

Abstract

In this paper, we address the challenge of optimizing training setups for Large Language Models (LLMs) of low-resource language with a limited amount of corpus. Existing works adopt multi-epoch, multi-lingual, and two-stage training to utilize the limited target language corpus efficiently. However, there is still a lack of understanding about the optimal hyperparameter setups for combining these three approaches to train LLMs. We exhaustively explore training setups for low-resource language LLM, combining these three approaches, and found the following insights for efficiently reducing the cost of hyperparameter search: (1) As the amount of target language corpus decreases, the optimal training approach shifts from monolingual single-stage training to multi-lingual two-stage training at a compute budget dependent threshold. (2) The optimal model scale remains stable regardless of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques