HyPER: Bridging Exploration and Exploitation for Scalable LLM Reasoning with Hypothesis Path Expansion and Reduction
Shengxuan Qiu, Haochen Huang, Shuzhang Zhong, Pengfei Zuo, Meng Li

TL;DR
HyPER introduces a dynamic control policy for multi-path decoding in large language models, effectively balancing exploration and exploitation to improve reasoning accuracy and efficiency without additional training.
Contribution
It presents HyPER, a training-free, online control method that adaptively manages hypothesis exploration and exploitation during inference in mixture-of-experts models.
Findings
HyPER improves reasoning accuracy by 8-10% across benchmarks.
It reduces token usage by 25-40% while maintaining or improving performance.
HyPER outperforms existing static exploration strategies in efficiency and accuracy.
Abstract
Scaling test-time compute with multi-path chain-of-thought improves reasoning accuracy, but its effectiveness depends critically on the exploration-exploitation trade-off. Existing approaches address this trade-off in rigid ways: tree-structured search hard-codes exploration through brittle expansion rules that interfere with post-trained reasoning, while parallel reasoning over-explores redundant hypothesis paths and relies on weak answer selection. Motivated by the observation that the optimal balance is phase-dependent and that correct and incorrect reasoning paths often diverge only at late stages, we reformulate test-time scaling as a dynamic expand-reduce control problem over a pool of hypotheses. We propose HyPER, a training-free online control policy for multi-path decoding in mixture-of-experts models that reallocates computation under a fixed budget using lightweight path…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
