Language on Demand, Knowledge at Core: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality

Mengyu Bu; Yang Feng

arXiv:2603.17512·cs.CL·April 17, 2026

Language on Demand, Knowledge at Core: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality

Mengyu Bu, Yang Feng

PDF

1 Repo 2 Models 1 Datasets

TL;DR

This paper introduces XBridge, a novel architecture that combines LLMs with pretrained translation models to improve multilingual capabilities, especially for low-resource and unseen languages, without retraining the LLM.

Contribution

XBridge leverages external translation models and alignment techniques to enhance multilingual performance of LLMs, addressing their imbalance across languages.

Findings

01

XBridge outperforms strong baselines in multilingual understanding and generation.

02

It significantly improves performance on low-resource and unseen languages.

03

No retraining of the LLM is required for these improvements.

Abstract

Large language models (LLMs) exhibit strong general intelligence, yet their multilingual performance remains highly imbalanced. Although LLMs encode substantial cross-lingual knowledge in a unified semantic space, they often struggle to reliably interface this knowledge with low-resource or unseen languages. Fortunately, pretrained encoder-decoder translation models already possess balanced multilingual capability, suggesting a natural complement to LLMs. In this work, we propose XBridge, a compositional encoder-LLM-decoder architecture that offloads multilingual understanding and generation to external pretrained translation models, while preserving the LLM as an English-centric core for general knowledge processing. To address the resulting representation misalignment across models, we introduce lightweight cross-model mapping layers and an optimal transport-based alignment objective,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ictnlp/XBridge
github

Models

Datasets

ICTNLP/XBridge
dataset· 47 dl
47 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.