DBES: A Systematic Benchmark and Metric Suite for Evaluating Expert Specialization in Large-Scale MoEs

Jing Wang,Hongxuan Lu,Jazze Young,Shu Wang,Zhimin Xin

arXiv:2605.18498·cs.LG·May 19, 2026

DBES: A Systematic Benchmark and Metric Suite for Evaluating Expert Specialization in Large-Scale MoEs

Jing Wang,Hongxuan Lu,Jazze Young,Shu Wang,Zhimin Xin

PDF

TL;DR

DBES introduces a comprehensive benchmark and metrics suite for evaluating expert specialization in large-scale MoE models, enabling better understanding and optimization of these systems.

Contribution

It provides the first systematic methodology to evaluate expert specialization independently of accuracy, with actionable metrics validated through domain-specific training improvements.

Findings

01

Qwen-series show modular specialization with high domain isolation.

02

DeepSeek and GLM employ distributed collaboration.

03

Using DBES metrics, domain-specific training improved performance by up to 94.48%.

Abstract

Expert specialization in Mixture-of-Experts (MoE) models remains poorly understood, with traditional evaluations conflating architectural load-balancing with functional specialization. We introduce DBES, a comprehensive diagnostic framework combining a multi-domain benchmark with five theoretically grounded metrics: Routing Specialization, Normalized Effective Rank, Domain Isolation, Routing Stiffness Score, and N-gram Expertise measures. Critical findings demonstrate distinct specialization paradigms across models: Qwen-series exhibit modular specialization with high domain isolation, while DeepSeek and GLM employ distributed collaboration. However, we emphasize that specialization is a diagnostic dimension, necessary but not sufficient for downstream performance. Most crucially, interventional evidence validates the actionability of these metrics: by using DBES to identify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.