SCALE: Synergized Collaboration of Asymmetric Language Translation Engines
Xin Cheng, Xun Wang, Tao Ge, Si-Qing Chen, Furu Wei and, Dongyan Zhao, Rui Yan

TL;DR
SCALE is a collaborative framework that unites specialized translation models and large language models to improve low-resource language translation, outperforming existing models without extensive fine-tuning.
Contribution
The paper introduces SCALE, a novel method that synergizes specialized translation models with large language models, enhancing translation quality and flexibility in low-resource settings.
Findings
Outperforms GPT-4 and NLLB in low-resource translation tasks.
Achieves a 4 BLEURT score improvement in Xhosa-English translation.
Effectively uses English-centric STM as a pivot for multiple language pairs.
Abstract
In this paper, we introduce SCALE, a collaborative framework that connects compact Specialized Translation Models (STMs) and general-purpose Large Language Models (LLMs) as one unified translation engine. By introducing translation from STM into the triplet in-context demonstrations, SCALE unlocks refinement and pivoting ability of LLM, thus mitigating language bias of LLM and parallel data bias of STM, enhancing LLM speciality without sacrificing generality, and facilitating continual learning without expensive LLM fine-tuning. Our comprehensive experiments show that SCALE significantly outperforms both few-shot LLMs (GPT-4) and specialized models (NLLB) in challenging low-resource settings. Moreover, in Xhosa to English translation, SCALE experiences consistent improvement by a 4 BLEURT score without tuning LLM and surpasses few-shot GPT-4 by 2.5 COMET score and 3.8 BLEURT score when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Linear Layer · Label Smoothing · Absolute Position Encodings · Adam · Residual Connection · Layer Normalization · Softmax
