Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
Junmo Kang, Leonid Karlinsky, Hongyin Luo, Zhen Wang, Jacob Hansen,, James Glass, David Cox, Rameswar Panda, Rogerio Feris, Alan Ritter

TL;DR
Self-MoE introduces a modular approach to enhance large language models by creating self-specialized experts through synthetic data, significantly improving performance across various tasks while maintaining flexibility and interpretability.
Contribution
The paper proposes Self-MoE, a novel method for transforming monolithic LLMs into compositional systems with self-generated experts, reducing reliance on labeled data and enabling dynamic task handling.
Findings
Self-MoE improves performance by 6.5% on average across benchmarks.
Specialized experts show trade-offs in non-specialized task performance.
Self-MoE outperforms other merging methods in flexibility and interpretability.
Abstract
We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts, named MiXSE (MiXture of Self-specialized Experts). Our approach leverages self-specialization, which constructs expert modules using self-generated synthetic data, each equipping a shared base LLM with distinct domain-specific capabilities, activated via self-optimized routing. This allows for dynamic and capability-specific handling of various target tasks, enhancing overall capabilities, without extensive human-labeled data and added parameters. Our empirical results reveal that specializing LLMs may exhibit potential trade-offs in performances on non-specialized tasks. On the other hand, our Self-MoE demonstrates substantial improvements (6.5%p on average) over the base LLM across diverse benchmarks such as knowledge, reasoning, math, and coding. It also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
MethodsBalanced Selection
