Dual-Dimensional Consistency: Balancing Budget and Quality in Adaptive Inference-Time Scaling
Rongman Xu, Yifei Li, Tianzhe Zhao, Yanrui Wu, Bo Li, Hang Yan

TL;DR
This paper introduces Dual-Dimensional Consistency, a framework for adaptive inference in LLMs that balances reasoning quality and resource efficiency, reducing token use significantly while maintaining accuracy.
Contribution
It presents a novel unified approach combining confidence-weighted Bayesian and trend-aware pruning to improve inference efficiency and reasoning quality in LLMs.
Findings
Reduces token consumption by over 10 times.
Maintains or exceeds baseline accuracy across five benchmarks.
Effectively filters hallucinations while accelerating reasoning.
Abstract
Large Language Models (LLMs) have demonstrated remarkable abilities in reasoning. However, maximizing their potential through inference-time scaling faces challenges in trade-off between sampling budget and reasoning quality. Current strategies remain inefficient as they typically treat sampling width and depth as orthogonal objectives, where width consensus methods risk reinforcing hallucinations, while depth pruning mechanisms prematurely truncate complex yet valid reasoning chains. Therefore, we propose Dual-Dimensional Consistency (DDC), a unified framework that bridges path quality with adaptive termination. By coupling Confidence-Weighted Bayesian protocol with a Trend-Aware Stratified Pruning, our method ensures that computational resources are concentrated on high quality reasoning paths, filtering hallucinations while accelerating consensus. Evaluations across five benchmarks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
