DBES: A Systematic Benchmark and Metric Suite for Evaluating Expert Specialization in Large-Scale MoEs
Jing Wang,Hongxuan Lu,Jazze Young,Shu Wang,Zhimin Xin

TL;DR
DBES introduces a comprehensive benchmark and metrics suite for evaluating expert specialization in large-scale MoE models, enabling better understanding and optimization of these systems.
Contribution
It provides the first systematic methodology to evaluate expert specialization independently of accuracy, with actionable metrics validated through domain-specific training improvements.
Findings
Qwen-series show modular specialization with high domain isolation.
DeepSeek and GLM employ distributed collaboration.
Using DBES metrics, domain-specific training improved performance by up to 94.48%.
Abstract
Expert specialization in Mixture-of-Experts (MoE) models remains poorly understood, with traditional evaluations conflating architectural load-balancing with functional specialization. We introduce DBES, a comprehensive diagnostic framework combining a multi-domain benchmark with five theoretically grounded metrics: Routing Specialization, Normalized Effective Rank, Domain Isolation, Routing Stiffness Score, and N-gram Expertise measures. Critical findings demonstrate distinct specialization paradigms across models: Qwen-series exhibit modular specialization with high domain isolation, while DeepSeek and GLM employ distributed collaboration. However, we emphasize that specialization is a diagnostic dimension, necessary but not sufficient for downstream performance. Most crucially, interventional evidence validates the actionability of these metrics: by using DBES to identify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
