Language Fusion for Parameter-Efficient Cross-lingual Transfer

Philipp Borchert; Ivan Vuli\'c; Marie-Francine Moens; Jochen De Weerdt

arXiv:2501.06892·cs.CL·May 27, 2025

Language Fusion for Parameter-Efficient Cross-lingual Transfer

Philipp Borchert, Ivan Vuli\'c, Marie-Francine Moens, Jochen De Weerdt

PDF

1 Repo 1 Video 4 Reviews

TL;DR

This paper introduces FLARE, a parameter-efficient adapter method that improves cross-lingual transfer performance by integrating source and target language representations, especially for underrepresented languages, without increasing computational complexity.

Contribution

FLARE is a novel adapter-based approach that enhances multilingual representation quality and transfer performance while maintaining parameter efficiency.

Findings

01

FLARE improves question-answering performance by 4.9% on Llama 3.1.

02

FLARE achieves a 2.2% improvement on Gemma~2.

03

Experiments demonstrate FLARE's effectiveness across multiple NLP tasks.

Abstract

Limited availability of multilingual text corpora for training language models often leads to poor performance on downstream tasks due to undertrained representation spaces for languages other than English. This 'under-representation' has motivated recent cross-lingual transfer methods to leverage the English representation space by e.g. mixing English and 'non-English' tokens at the input level or extending model parameters to accommodate new languages. However, these approaches often come at the cost of increased computational complexity. We propose Fusion forLanguage Representations (FLARE) in adapters, a novel method that enhances representation quality and downstream performance for languages other than English while maintaining parameter efficiency. FLARE integrates source and target language representations within low-rank (LoRA) adapters using lightweight linear transformations,…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 3Confidence 4

Strengths

- The method is parameter-efficient, as it achieves improved cross-lingual performance without additional parameters. - The method maintains computational efficiency by fusing representations within adapters rather than extending input sequences. - The method allows the integration of various representation types, such as latent translations from machine translation models.

Weaknesses

- FLARE’s performance is dependent on the quality of machine translations, potentially limiting its effectiveness in low-resource languages. - The method has been tested primarily in bilingual scenarios, which may limit its generalizability to more complex multilingual contexts.

Reviewer 02Rating 6Confidence 3

Strengths

**Summary Of Strengths**: - Parameter/computation efficiency: No additional parameters are needed to improve performance, as the fusion occurs within the adapter bottlenecks. - Quality: The work shows positive experimental results that narrow the gaps between English and non-English language performances across multiple downstream tasks. Meanwhile, the comprehensive discussion and analysis in Section 5 brings insightful ideas to the community regarding this research direction.

Weaknesses

**Summary Of Weaknesses**: - Limitations: As the authors note, this work focuses on bilingual transfer, so it is very difficult to draw conclusions about the effectiveness of this method in language identification-agnostic scenarios. - Dependencies on data quality: Multilingual or, especially, low-resource data present significant challenges in LLM training. While this work seems promising, the impact of removing this dependency has not been fully addressed.

Reviewer 03Rating 5Confidence 4

Strengths

1. FLARE enhances cross-lingual transfer without increasing the model's parameter count or computational overhead. By integrating source (e.g., English) and target language representations within LoRA adapters, it maintains efficiency while improving performance. 2. The method demonstrates consistent improvements across various tasks—natural language inference, question answering, and sentiment analysis, highlighting its general applicability in cross-lingual settings. 3. FLARE is evaluated on m

Weaknesses

1. The main contribution appears to be the integration of source and target language representations within LoRA adapters, which may be seen as an incremental extension of existing methods. The paper could benefit from a clearer articulation of how FLARE differentiates itself from prior work and what specific novel insights it brings to the field. 2. While FLARE shows consistent improvements, the performance gains over baselines are relatively modest. A more thorough analysis is needed to demon

Reviewer 04Rating 3Confidence 4

Strengths

The approach demonstrates broad applicability across various models, as evidenced by experiments conducted on XLM-R, mT5, and Llama3. Additionally, the integration of adapters with cross-lingual transfer strategies makes the method cost-efficient.

Weaknesses

Firstly, the modification to LoRA appears similar to existing methods [1, 2], and the observed improvements in XLT performance following adapter fine-tuning with representations from both source and target languages are unsurprising. According to the results presented in Table 1, the enhancements achieved by FLARE are marginal and constrained by the underlying translation model's performance, limiting its effectiveness for low-resource languages. Secondly, the chosen baselines seem weak. For in

Code & Models

Repositories

pnborchert/flare
pytorchOfficial

Videos

Language Fusion for Parameter-Efficient Cross-lingual Transfer· underline

Taxonomy

MethodsLLaMA