TL;DR
KALAVAI introduces a predictive model for post-hoc fusion of independent domain specialists into a single model, demonstrating consistent gains and effective routing with minimal divergence.
Contribution
The paper presents a protocol enabling predictable, high-performance fusion of independently trained specialists using lightweight routing and shared initialization.
Findings
Fusion gains are predictable based on divergence, with a strong correlation (R^2=0.856).
Lightweight MoE routing achieves near-oracle domain assignment accuracy (<10^{-5} nats).
Cross-lingual fusion yields significant perplexity reductions, e.g., Yoruba from 41.9 to 7.7.
Abstract
Independently trained domain specialists can be fused post-hoc into a single model that outperforms any individual specialist, and the gain is predictable: gain = 0.82 x divergence - 2.72 (R^2 = 0.856, n=6, 3-26% divergence). This enables practitioners to estimate cooperative value before committing compute. Below ~3.3% divergence, gains approach zero.In the KALAVAI protocol, contributors fine-tune copies of a shared checkpoint independently, then submit for lightweight MoE routing (500 steps). Gains are consistent: +7.72% at 410M (+/-0.02%, 3 seeds), +7.49% at 1B (+/-0.01%, 3 seeds), +6.53% at 6.9B, each over the best specialist. The router matches domain-oracle routing within <10^{-5} nats. Cross-lingual fusion (Tamil/Yoruba/Welsh/Code) achieves +21.76%, with Yoruba perplexity falling 41.9 to 7.7. A 20-contributor federation achieves +16.71% (+/-0.07pp, 3 seeds).Three requirements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗mechramc/kalavai-cross-lingual-yoruba-specialist-seed137model· 5 dl5 dl
- 🤗mechramc/kalavai-cross-lingual-yoruba-specialist-seed2026model
- 🤗mechramc/kalavai-cross-lingual-welsh-specialist-seed137model· 5 dl5 dl
- 🤗mechramc/kalavai-cross-lingual-welsh-specialist-seed2026model· 3 dl3 dl
- 🤗mechramc/kalavai-cross-lingual-tamil-specialist-seed137model· 2 dl2 dl
- 🤗mechramc/kalavai-cross-lingual-tamil-specialist-seed2026model· 2 dl2 dl
- 🤗mechramc/kalavai-cross-lingual-code-specialist-seed137model· 4 dl4 dl
- 🤗mechramc/kalavai-cross-lingual-code-specialist-seed2026model· 2 dl2 dl
- 🤗mechramc/kalavai-private-domain-medical-specialist-seed42model· 4 dl4 dl
- 🤗mechramc/kalavai-private-domain-legal-specialist-seed42model· 5 dl5 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
