Utilizing Multilingual Encoders to Improve Large Language Models for Low-Resource Languages

Imalsha Puranegedara; Themira Chathumina; Nisal Ranathunga; Nisansa de Silva; Surangika Ranathunga; Mokanarangan Thayaparan

arXiv:2508.09091·cs.CL·November 11, 2025

Utilizing Multilingual Encoders to Improve Large Language Models for Low-Resource Languages

Imalsha Puranegedara, Themira Chathumina, Nisal Ranathunga, Nisansa de Silva, Surangika Ranathunga, Mokanarangan Thayaparan

PDF

TL;DR

This paper introduces a novel method that fuses all intermediate layers of multilingual encoders to enhance large language models' performance on low-resource languages without using multilingual training data.

Contribution

It proposes a new architecture with layer fusion strategies, including a Transformer Softmax, to improve multilingual understanding in LLMs trained solely on English data.

Findings

01

Significant performance improvements on LRL benchmarks.

02

Enhanced classification accuracy for Sinhala and Indic languages.

03

Overall increase in XNLI accuracy from 70.36% to 71.50%.

Abstract

Large Language Models (LLMs) excel in English, but their performance degrades significantly on low-resource languages (LRLs) due to English-centric training. While methods like LangBridge align LLMs with multilingual encoders such as the Massively Multilingual Text-to-Text Transfer Transformer (mT5), they typically use only the final encoder layer. We propose a novel architecture that fuses all intermediate layers, enriching the linguistic information passed to the LLM. Our approach features two strategies: (1) a Global Softmax weighting for overall layer importance, and (2) a Transformer Softmax model that learns token-specific weights. The fused representations are mapped into the LLM's embedding space, enabling it to process multilingual inputs. The model is trained only on English data, without using any parallel or multilingual data. Evaluated on XNLI, IndicXNLI, Sinhala News…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.