LinguaLIFT: An Effective Two-stage Instruction Tuning Framework for   Low-Resource Language Reasoning

Hongbin Zhang; Kehai Chen; Xuefeng Bai; Yang Xiang; Min Zhang

arXiv:2412.12499·cs.CL·February 18, 2025

LinguaLIFT: An Effective Two-stage Instruction Tuning Framework for Low-Resource Language Reasoning

Hongbin Zhang, Kehai Chen, Xuefeng Bai, Yang Xiang, Min Zhang

PDF

Open Access

TL;DR

LinguaLIFT is a two-stage instruction tuning framework that enhances reasoning abilities in low-resource languages by leveraging a language alignment layer and English-only data, addressing resource imbalance and evaluation bias.

Contribution

It introduces LinguaLIFT, a novel two-stage instruction tuning method with a language alignment layer that improves low-resource language reasoning without requiring multilingual instruction data.

Findings

01

Outperforms baseline models on MMWP and other benchmarks

02

Effectively transfers cross-lingual reasoning to low-resource languages

03

Introduces the Multilingual Math World Problem benchmark

Abstract

Large language models (LLMs) have exhibited impressive multilingual reasoning capabilities, driven by extensive multilingual pre-training corpora and instruction fine-tuning data. However, a performance gap exists between high- and low-resource language reasoning tasks due to the language imbalance in the pre-training corpus, which is exacerbated by evaluation bias in existing reasoning benchmarks lacking low-resource language coverage. To alleviate this issue, we propose LinguaLIFT, a two-stage instruction tuning framework for advancing low-resource language reasoning. LinguaLIFT employs a language alignment layer to capture multilingual alignment in a code-switched tuning way without requiring multilingual instruction or parallel data, thereby transferring the cross-lingual reasoning capabilities to low-resource languages through English-only instruction tuning data. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Intelligent Tutoring Systems and Adaptive Learning