ConfSpec: Efficient Step-Level Speculative Reasoning via Confidence-Gated Verification
Siran Liu, Cyril Y. He

TL;DR
ConfSpec introduces a confidence-gated verification framework that accelerates step-level reasoning in large language models, achieving significant speedups while maintaining accuracy by selectively verifying reasoning steps.
Contribution
It presents a novel cascaded verification approach that leverages small draft models for high-confidence decisions, reducing inference latency without external judges.
Findings
Up to 2.24× speedup in inference time
Maintains target-model accuracy with verification
Compatible with existing speculative decoding methods
Abstract
Chain-of-Thought reasoning significantly improves the performance of large language models on complex tasks, but incurs high inference latency due to long generation traces. Step-level speculative reasoning aims to mitigate this cost, yet existing approaches face a long-standing trade-off among accuracy, inference speed, and resource efficiency. We propose ConfSpec, a confidence-gated cascaded verification framework that resolves this trade-off. Our key insight is an asymmetry between generation and verification: while generating a correct reasoning step requires substantial model capacity, step-level verification is a constrained discriminative task for which small draft models are well-calibrated within their competence range, enabling high-confidence draft decisions to be accepted directly while selectively escalating uncertain cases to the large target model. Evaluation across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
