Mixture of Modular Experts: Distilling Knowledge from a Multilingual Teacher into Specialized Modular Language Models
Mohammed Al-Maamari, Mehdi Ben Amor, Michael Granitzer

TL;DR
This paper introduces a modular multilingual language model framework combining Knowledge Distillation and Mixture of Experts, demonstrating effective language classification, knowledge retention, and efficiency improvements with open-source resources.
Contribution
It presents a novel integration of KD and MoE for multilingual models, compares different architectures, and addresses catastrophic forgetting with practical solutions.
Findings
Adaptive alpha in KD offers marginal improvements over fixed alpha.
The router classifier achieved 99.95% accuracy in language classification.
MoE with common expert mitigates catastrophic forgetting effectively.
Abstract
This research combines Knowledge Distillation (KD) and Mixture of Experts (MoE) to develop modular, efficient multilingual language models. Key objectives include evaluating adaptive versus fixed alpha methods in KD and comparing modular MoE architectures for handling multi-domain inputs and preventing catastrophic forgetting. KD compresses large language models (LLMs) into smaller, efficient models, while MoE enhances modularity with specialized tasks. Experiments showed similar performance for both KD methods, with marginal improvements from adaptive alpha. A combined loss approach provided more stable learning. The router, trained to classify input sequences into English, French, German, or Python, achieved 99.95% precision, recall, and F1 score, with Logistic Regression being the most effective classifier. Evaluations of modular MoE architectures revealed that Pre-trained Language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecond Language Learning and Teaching · Innovative Teaching and Learning Methods
MethodsLogistic Regression · Mixture of Experts · Knowledge Distillation
