TL;DR
TriMix is a dynamic logit fusion framework that enhances low-resource language adaptation by balancing specialized small models, high-resource instruction tuning, and large model scaling, without requiring task annotations.
Contribution
Proposes TriMix, a novel test-time logit fusion method that improves low-resource language adaptation by dynamically combining multiple model sources efficiently.
Findings
TriMix outperforms single-model baselines and Proxy Tuning across multiple languages.
Prioritizing small LRL-specialized model logits is key for success.
TriMix requires no task annotations and only continual pretraining on small models.
Abstract
Adapting large language models (LLMs) to low-resource languages (LRLs) is constrained by the scarcity of task data and computational resources. Although Proxy Tuning offers a logit-level strategy for introducing scaling effects, it often fails in LRL settings because the large model's weak LRL competence might overwhelm the knowledge of specialized smaller models. We thus propose TriMix, a test-time logit fusion framework that dynamically balances capabilities from three different sources: LRL competence from a continually pretrained small model, task competence from high-resource language instruction tuning, and the scaling benefits of large models. It is data- and compute-efficient, requiring no LRL task annotations, and only continual pretraining on a small model. Experiments across four model families and eight LRLs show that TriMix consistently outperforms single-model baselines…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗pkupie/Qwen2.5-1.5B-bo-cptmodel· 372 dl372 dl
- 🤗pkupie/Qwen2.5-1.5B-ug-cptmodel· 374 dl374 dl
- 🤗pkupie/Qwen2.5-1.5B-kk-cptmodel· 390 dl390 dl
- 🤗pkupie/Qwen2.5-1.5B-mn-cptmodel· 406 dl406 dl
- 🤗pkupie/Qwen2.5-3B-bo-cptmodel· 370 dl370 dl
- 🤗pkupie/Qwen2.5-3B-ug-cptmodel· 379 dl379 dl
- 🤗pkupie/Qwen2.5-3B-kk-cptmodel· 397 dl397 dl
- 🤗pkupie/Qwen2.5-3B-mn-cptmodel· 429 dl429 dl
- 🤗pkupie/gemma-3-4b-bo-cptmodel· 246 dl246 dl
- 🤗pkupie/gemma-3-4b-ug-cptmodel· 266 dl266 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
