Certain Head, Uncertain Tail: Expert-Sample for Test-Time Scaling in Fine-Grained MoE

Yuanteng Chen; Peisong Wang; Nanxin Zeng; Yuantian Shao; Shuang Qiu; Gang Li; Jing Liu; Jian Cheng

arXiv:2602.02443·cs.LG·May 4, 2026

Certain Head, Uncertain Tail: Expert-Sample for Test-Time Scaling in Fine-Grained MoE

Yuanteng Chen, Peisong Wang, Nanxin Zeng, Yuantian Shao, Shuang Qiu, Gang Li, Jing Liu, Jian Cheng

PDF

TL;DR

This paper introduces Expert-Sample, a training-free method that enhances diversity in fine-grained MoE models by selectively injecting stochasticity into uncertain expert predictions, improving reasoning accuracy.

Contribution

It uncovers the router score pattern in fine-grained MoE and proposes a novel sampling method that boosts performance without retraining.

Findings

01

Expert-Sample improves pass@n and verification accuracy across multiple tasks.

02

On Qwen3-30B-A3B-Instruct, pass@32 increases from 85.4% to 91.9%.

03

Accuracy on GPQA-Diamond improves from 59.1% to 62.6%.

Abstract

Test-time scaling improves LLM performance by generating multiple candidate solutions, yet token-level sampling requires temperature tuning that trades off diversity against stability. Fine-grained MoE, featuring hundreds of well-trained experts per layer and multi-expert activation per token, offers an unexplored alternative through its rich routing space. We empirically characterize fine-grained MoE routing and uncover an informative pattern: router scores exhibit a certain head of high-confidence experts followed by an uncertain tail of low-confidence candidates. While single-run greedy accuracy remains stable when fewer experts are activated, multi-sample pass@n degrades significantly-suggesting that the certain head governs core reasoning capability while the uncertain tail correlates with reasoning diversity. Motivated by these findings, we propose Expert-Sample, a training-free…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.