Self-MoE: Towards Compositional Large Language Models with   Self-Specialized Experts

Junmo Kang; Leonid Karlinsky; Hongyin Luo; Zhen Wang; Jacob Hansen,; James Glass; David Cox; Rameswar Panda; Rogerio Feris; Alan Ritter

arXiv:2406.12034·cs.CL·October 8, 2024·2 cites

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Junmo Kang, Leonid Karlinsky, Hongyin Luo, Zhen Wang, Jacob Hansen,, James Glass, David Cox, Rameswar Panda, Rogerio Feris, Alan Ritter

PDF

Open Access 1 Video

TL;DR

Self-MoE introduces a modular approach to enhance large language models by creating self-specialized experts through synthetic data, significantly improving performance across various tasks while maintaining flexibility and interpretability.

Contribution

The paper proposes Self-MoE, a novel method for transforming monolithic LLMs into compositional systems with self-generated experts, reducing reliance on labeled data and enabling dynamic task handling.

Findings

01

Self-MoE improves performance by 6.5% on average across benchmarks.

02

Specialized experts show trade-offs in non-specialized task performance.

03

Self-MoE outperforms other merging methods in flexibility and interpretability.

Abstract

We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts, named MiXSE (MiXture of Self-specialized Experts). Our approach leverages self-specialization, which constructs expert modules using self-generated synthetic data, each equipping a shared base LLM with distinct domain-specific capabilities, activated via self-optimized routing. This allows for dynamic and capability-specific handling of various target tasks, enhancing overall capabilities, without extensive human-labeled data and added parameters. Our empirical results reveal that specializing LLMs may exhibit potential trade-offs in performances on non-specialized tasks. On the other hand, our Self-MoE demonstrates substantial improvements (6.5%p on average) over the base LLM across diverse benchmarks such as knowledge, reasoning, math, and coding. It also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies

MethodsBalanced Selection