ELO: Efficient Layer-Specific Optimization for Continual Pretraining of Multilingual LLMs
HanGyeol Yoo, ChangSu Choi, Minjun Kim, Seohyun Song, SeungWoo Song, Inho Won, Jongyoul Park, Cheoneum Park, KyungTae Lim

TL;DR
The paper introduces ELO, a layer-specific optimization technique that accelerates continual pretraining of multilingual LLMs, reducing computational costs and improving target language performance while maintaining source language abilities.
Contribution
ELO is a novel method that selectively trains key layers for target languages, significantly speeding up training and enhancing performance compared to traditional continual pretraining approaches.
Findings
Achieves up to 6.46x training speedup.
Improves target language performance by up to 6.2%.
Effectively preserves source language capabilities.
Abstract
We propose an efficient layer-specific optimization (ELO) method designed to enhance continual pretraining (CP) for specific languages in multilingual large language models (MLLMs). This approach addresses the common challenges of high computational cost and degradation of source language performance associated with traditional CP. The ELO method consists of two main stages: (1) ELO Pretraining, where a small subset of specific layers, identified in our experiments as the critically important first and last layers, are detached from the original MLLM and trained with the target language. This significantly reduces not only the number of trainable parameters but also the total parameters computed during the forward pass, minimizing GPU memory consumption and accelerating the training process. (2) Layer Alignment, where the newly trained layers are reintegrated into the original model,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
