ConfSpec: Efficient Step-Level Speculative Reasoning via Confidence-Gated Verification

Siran Liu; Cyril Y. He

arXiv:2602.18447·cs.CL·February 24, 2026

ConfSpec: Efficient Step-Level Speculative Reasoning via Confidence-Gated Verification

Siran Liu, Cyril Y. He

PDF

Open Access

TL;DR

ConfSpec introduces a confidence-gated verification framework that accelerates step-level reasoning in large language models, achieving significant speedups while maintaining accuracy by selectively verifying reasoning steps.

Contribution

It presents a novel cascaded verification approach that leverages small draft models for high-confidence decisions, reducing inference latency without external judges.

Findings

01

Up to 2.24× speedup in inference time

02

Maintains target-model accuracy with verification

03

Compatible with existing speculative decoding methods

Abstract

Chain-of-Thought reasoning significantly improves the performance of large language models on complex tasks, but incurs high inference latency due to long generation traces. Step-level speculative reasoning aims to mitigate this cost, yet existing approaches face a long-standing trade-off among accuracy, inference speed, and resource efficiency. We propose ConfSpec, a confidence-gated cascaded verification framework that resolves this trade-off. Our key insight is an asymmetry between generation and verification: while generating a correct reasoning step requires substantial model capacity, step-level verification is a constrained discriminative task for which small draft models are well-calibrated within their competence range, enabling high-confidence draft decisions to be accepted directly while selectively escalating uncertain cases to the large target model. Evaluation across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)