Zero-Overhead Introspection for Adaptive Test-Time Compute

Rohin Manvi; Joey Hong; Tim Seyde; Maxime Labonne; Mathias Lechner; Sergey Levine

arXiv:2512.01457·cs.LG·December 24, 2025

Zero-Overhead Introspection for Adaptive Test-Time Compute

Rohin Manvi, Joey Hong, Tim Seyde, Maxime Labonne, Mathias Lechner, Sergey Levine

PDF

Open Access 1 Models 3 Reviews

TL;DR

ZIP-RC introduces a zero-overhead introspective method for large language models that predicts reward and cost in real-time, enabling adaptive inference to improve efficiency and accuracy without extra computation.

Contribution

The paper presents ZIP-RC, a novel approach that reuses logits for joint reward and cost prediction during inference, eliminating additional overhead and enabling adaptive, cost-effective reasoning.

Findings

01

Improves accuracy by up to 12% over majority voting.

02

Traces smooth Pareto frontiers between quality, compute, and latency.

03

Enables adaptive inference without extra models or inference overhead.

Abstract

Large language models excel at reasoning but lack key aspects of introspection, including anticipating their own success and the computation required to achieve it. Humans use real-time introspection to decide how much effort to invest, when to make multiple attempts, when to stop, and when to signal success or failure. Without this, LLMs struggle to make intelligent meta-cognition decisions. Test-time scaling methods like Best-of-N drive up cost and latency by using a fixed budget of samples regardless of the marginal benefit of each one at any point in generation, and the absence of confidence signals can mislead people, prevent appropriate escalation to better tools, and undermine trustworthiness. Learned verifiers or reward models can provide confidence estimates, but do not enable adaptive inference and add substantial cost by requiring extra models or forward passes. We present…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 2Confidence 4

Strengths

- The promise of the paper, "Zero-overhead inference-time" control is a very promising and timely research direction - The authors show that their approach is decently accurate at predicting the rewards during generation. - The proposed approach achieves improvements, often significant, on a suite of reasoning benchmarks at the same cost as the baselines.

Weaknesses

- I found the paper extremely difficult to read, and beyond the promise of a "zero-overhead inference-time prediction of reward" found it very hard to glean much if any insight on the core contributions of the paper beyond the fact it makes use of extra logits at every step. - The related works section is really lacking giving how active an area of research inference-time control of LMs is. - line 85, broken figure reference. - Paragraph 074-085 of the introduction misses the mark when it co

Reviewer 02Rating 8Confidence 3

Strengths

1.The paper proposes the ZIP-RC mechanism, which leverages reserved tokens in the vocabulary to enable real-time prediction of reward and remaining tokens without adding extra forward passes. This idea is highly innovative. Combined with adaptive pruning, the approach significantly improves inference efficiency and is of substantial practical importance. 2.Experimental results show that ZIP-RC clearly outperforms the baseline methods without such modification, demonstrating the strong potential

Weaknesses

1.Although reserved tokens are used to avoid additional computation overhead, this strategy may still introduce distribution shift to some extent. The paper would benefit from additional comparison or analysis on the extent of distribution shift before and after modifying the loss function. 2.The experiments are primarily conducted on relatively small models (mostly under 2B). It would strengthen the work to extend evaluation to larger-scale models to verify scalability. Additionally, the exper

Reviewer 03Rating 6Confidence 4

Strengths

1. The idea of using reserved vocabulary positions to produce auxiliary predictions with truly zero overhead is novel and elegant. Rather than requiring separate forward passes or additional models like most verifier approaches, ZIP-RC extracts rich signals from logits that would otherwise go unused. The joint modeling of reward and cost (rather than just scalar confidence) is a key insight that enables principled decision-making about the reward-cost tradeoff. 2. The authors clearly articulate

Weaknesses

1. The evaluation focuses exclusively on mathematical reasoning tasks. It's unclear whether ZIP-RC's benefits extend to other domains like creative writing, coding, or open-ended question answering where the reward structure and token length distributions may be very different. Mathematical problems have clear correctness labels and relatively predictable structure, which may make reward/cost prediction easier than in other domains. 2. The authors acknowledge that their method relies on having

Code & Models

Models

🤗
dataopsnick/Qwen3-4B-Instruct-2507-zip-rc
model· 6 dl
6 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Natural Language Processing Techniques