TL;DR
This paper investigates whether domain-specific experts exist in MoE-based LLMs and introduces a training-free framework, DSMoE, that enhances domain specialization without additional inference costs.
Contribution
The study provides empirical evidence for domain-specific experts in MoE-based LLMs and proposes DSMoE, a novel, training-free method to improve domain specialization and generalization.
Findings
Empirical evidence of domain-specific experts in MoE-based LLMs.
DSMoe outperforms baseline models without extra training or inference costs.
Method demonstrates strong performance across multiple domains and models.
Abstract
In the era of Large Language Models (LLMs), the Mixture of Experts (MoE) architecture has emerged as an effective approach for training extremely large models with improved computational efficiency. This success builds upon extensive prior research aimed at enhancing expert specialization in MoE-based LLMs. However, the nature of such specializations and how they can be systematically interpreted remain open research challenges. In this work, we investigate this gap by posing a fundamental question: \textit{Do domain-specific experts exist in MoE-based LLMs?} To answer the question, we evaluate ten advanced MoE-based LLMs ranging from 3.8B to 120B parameters and provide empirical evidence for the existence of domain-specific experts. Building on this finding, we propose \textbf{Domain Steering Mixture of Experts (DSMoE)}, a training-free framework that introduces zero additional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
