Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training

Mengru Wang; Xingyu Chen; Yue Wang; Zhiwei He; Jiahao Xu; Tian Liang; Qiuzhi Liu; Yunzhi Yao; Wenxuan Wang; Ruotian Ma; Haitao Mi; Ningyu Zhang; Zhaopeng Tu; Xiaolong Li; Dong Yu

arXiv:2505.14681·cs.AI·May 28, 2025

Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training

Mengru Wang, Xingyu Chen, Yue Wang, Zhiwei He, Jiahao Xu, Tian Liang, Qiuzhi Liu, Yunzhi Yao, Wenxuan Wang, Ruotian Ma, Haitao Mi, Ningyu Zhang, Zhaopeng Tu, Xiaolong Li, Dong Yu

PDF

Open Access

TL;DR

This paper introduces RICE, a novel inference-time method that enhances reasoning accuracy and efficiency in large models by identifying and leveraging specialized cognitive experts without additional training.

Contribution

The paper proposes Reinforcing Cognitive Experts (RICE), a new inference-time steering technique that improves reasoning in MoE models without extra training or complex heuristics.

Findings

01

RICE significantly improves reasoning accuracy across benchmarks.

02

It enhances cognitive efficiency and cross-domain generalization.

03

Outperforms existing reasoning-steering techniques.

Abstract

Mixture-of-Experts (MoE) architectures within Large Reasoning Models (LRMs) have achieved impressive reasoning capabilities by selectively activating experts to facilitate structured cognitive processes. Despite notable advances, existing reasoning models often suffer from cognitive inefficiencies like overthinking and underthinking. To address these limitations, we introduce a novel inference-time steering methodology called Reinforcing Cognitive Experts (RICE), designed to improve reasoning performance without additional training or complex heuristics. Leveraging normalized Pointwise Mutual Information (nPMI), we systematically identify specialized experts, termed ''cognitive experts'' that orchestrate meta-level reasoning operations characterized by tokens like ''<think>''. Empirical evaluations with leading MoE-based LRMs (DeepSeek-R1 and Qwen3-235B) on rigorous quantitative and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCognitive Science and Mapping · AI-based Problem Solving and Planning