Cross-Lingual Optimization for Language Transfer in Large Language Models

Jungseob Lee; Seongtae Hong; Hyeonseok Moon; and Heuiseok Lim

arXiv:2505.14297·cs.CL·May 21, 2025

Cross-Lingual Optimization for Language Transfer in Large Language Models

Jungseob Lee, Seongtae Hong, Hyeonseok Moon, and Heuiseok Lim

PDF

Open Access 1 Video

TL;DR

This paper introduces Cross-Lingual Optimization (CLO), a method that efficiently transfers large language models to new languages while maintaining English capabilities, especially effective in low-resource settings.

Contribution

CLO leverages English data and translation models to improve multilingual transfer, outperforming standard fine-tuning in low-resource scenarios with less data.

Findings

01

CLO outperforms SFT in target language proficiency and English retention.

02

CLO with fewer samples surpasses SFT with more data in low-resource languages.

03

CLO remains robust across varying data quantities, unlike SFT.

Abstract

Adapting large language models to other languages typically employs supervised fine-tuning (SFT) as a standard approach. However, it often suffers from an overemphasis on English performance, a phenomenon that is especially pronounced in data-constrained environments. To overcome these challenges, we propose \textbf{Cross-Lingual Optimization (CLO)} that efficiently transfers an English-centric LLM to a target language while preserving its English capabilities. CLO utilizes publicly available English SFT data and a translation model to enable cross-lingual transfer. We conduct experiments using five models on six languages, each possessing varying levels of resource. Our results show that CLO consistently outperforms SFT in both acquiring target language proficiency and maintaining English performance. Remarkably, in low-resource languages, CLO with only 3,200 samples surpasses SFT with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Cross-Lingual Optimization for Language Transfer in Large Language Models· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsShrink and Fine-Tune