HyPER: Bridging Exploration and Exploitation for Scalable LLM Reasoning with Hypothesis Path Expansion and Reduction

Shengxuan Qiu; Haochen Huang; Shuzhang Zhong; Pengfei Zuo; Meng Li

arXiv:2602.06527·cs.AI·May 12, 2026

HyPER: Bridging Exploration and Exploitation for Scalable LLM Reasoning with Hypothesis Path Expansion and Reduction

Shengxuan Qiu, Haochen Huang, Shuzhang Zhong, Pengfei Zuo, Meng Li

PDF

TL;DR

HyPER introduces a dynamic control policy for multi-path decoding in large language models, effectively balancing exploration and exploitation to improve reasoning accuracy and efficiency without additional training.

Contribution

It presents HyPER, a training-free, online control method that adaptively manages hypothesis exploration and exploitation during inference in mixture-of-experts models.

Findings

01

HyPER improves reasoning accuracy by 8-10% across benchmarks.

02

It reduces token usage by 25-40% while maintaining or improving performance.

03

HyPER outperforms existing static exploration strategies in efficiency and accuracy.

Abstract

Scaling test-time compute with multi-path chain-of-thought improves reasoning accuracy, but its effectiveness depends critically on the exploration-exploitation trade-off. Existing approaches address this trade-off in rigid ways: tree-structured search hard-codes exploration through brittle expansion rules that interfere with post-trained reasoning, while parallel reasoning over-explores redundant hypothesis paths and relies on weak answer selection. Motivated by the observation that the optimal balance is phase-dependent and that correct and incorrect reasoning paths often diverge only at late stages, we reformulate test-time scaling as a dynamic expand-reduce control problem over a pool of hypotheses. We propose HyPER, a training-free online control policy for multi-path decoding in mixture-of-experts models that reallocates computation under a fixed budget using lightweight path…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.