interwhen: A Generalizable Framework for Steering Reasoning Models with Test-time Verification
Vishak K Bhat, Prateek Chanda, Vijval Ekbote, Ashmit Khandelwal, Maitreyi Swaroop, Vineeth N. Balasubramanian, Subbarao Kambhampati, Nagarajan Natarajan, Amit Sharma

TL;DR
interwhen is a versatile test-time verification framework that guides reasoning models using intermediate feedback and automatically generated verifiers, significantly enhancing accuracy and policy adherence.
Contribution
It introduces a novel, plug-and-play verification system that monitors reasoning traces and synthesizes verifiers from natural language policies, improving model reliability without fine-tuning.
Findings
Achieves near-perfect accuracy on reasoning benchmarks with minimal token usage.
Improves task completion rates from 32% to 87% on telecom domain benchmarks.
Enables policy compliance and correctness in reasoning models through automatic verifier synthesis.
Abstract
Reasoning models produce long traces of intermediate decisions and tool calls, making test-time verification important for ensuring correctness. Existing approaches either verify only the final answer, which misses early errors, or rely on branch-and-verify strategies that explore multiple trajectories. We introduce interwhen, a single-trajectory verification framework that steers model behavior by providing feedback on intermediate reasoning traces. It addresses two key challenges. First, given a set of verifiers, obtaining verifiable states from the reasoning trace typically requires prompt engineering or external task decomposition into fixed steps. Instead, we propose a monitoring system that periodically polls the reasoning trace and forks inference of the reasoning model to recover intermediate states. Verifiers are run asynchronously alongside generation, adding negligible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Introducing Interwhen: Steering reasoning agents with real-time verification· youtube
Microsoft Research Forum | Season 2, Episode 4· youtube
