TL;DR
This paper investigates prefix-based adaptation methods for zero-shot cross-lingual transfer in large language models, demonstrating their effectiveness and scalability across multiple languages and model sizes, outperforming traditional fine-tuning techniques.
Contribution
It provides a comprehensive analysis of prefix-based methods for zero-shot multilingual transfer, showing they outperform LoRA in various settings with minimal additional parameters.
Findings
Prefix methods outperform LoRA by up to 6% on Belebele benchmark.
Consistent improvements across diverse benchmarks and languages.
Effective even with only 1.23M learnable parameters.
Abstract
With the release of new large language models (LLMs) like Llama and Mistral, zero-shot cross-lingual transfer has become increasingly feasible due to their multilingual pretraining and strong generalization capabilities. However, adapting these decoder-only LLMs to new tasks across languages remains challenging. While parameter-efficient fine-tuning (PeFT) techniques like Low-Rank Adaptation (LoRA) are widely used, prefix-based techniques such as soft prompt tuning, prefix tuning, and Llama Adapter are less explored, especially for zero-shot transfer in decoder-only models. We present a comprehensive study of three prefix-based methods for zero-shot cross-lingual transfer from English to 35+ high- and low-resource languages. Our analysis further explores transfer across linguistic families and scripts, as well as the impact of scaling model sizes from 1B to 24B. With Llama 3.1 8B,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
