Certain Head, Uncertain Tail: Expert-Sample for Test-Time Scaling in Fine-Grained MoE
Yuanteng Chen, Peisong Wang, Nanxin Zeng, Yuantian Shao, Shuang Qiu, Gang Li, Jing Liu, Jian Cheng

TL;DR
This paper introduces Expert-Sample, a training-free method that enhances diversity in fine-grained MoE models by selectively injecting stochasticity into uncertain expert predictions, improving reasoning accuracy.
Contribution
It uncovers the router score pattern in fine-grained MoE and proposes a novel sampling method that boosts performance without retraining.
Findings
Expert-Sample improves pass@n and verification accuracy across multiple tasks.
On Qwen3-30B-A3B-Instruct, pass@32 increases from 85.4% to 91.9%.
Accuracy on GPQA-Diamond improves from 59.1% to 62.6%.
Abstract
Test-time scaling improves LLM performance by generating multiple candidate solutions, yet token-level sampling requires temperature tuning that trades off diversity against stability. Fine-grained MoE, featuring hundreds of well-trained experts per layer and multi-expert activation per token, offers an unexplored alternative through its rich routing space. We empirically characterize fine-grained MoE routing and uncover an informative pattern: router scores exhibit a certain head of high-confidence experts followed by an uncertain tail of low-confidence candidates. While single-run greedy accuracy remains stable when fewer experts are activated, multi-sample pass@n degrades significantly-suggesting that the certain head governs core reasoning capability while the uncertain tail correlates with reasoning diversity. Motivated by these findings, we propose Expert-Sample, a training-free…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
