interwhen: A Generalizable Framework for Steering Reasoning Models with Test-time Verification

Vishak K Bhat; Prateek Chanda; Vijval Ekbote; Ashmit Khandelwal; Maitreyi Swaroop; Vineeth N. Balasubramanian; Subbarao Kambhampati; Nagarajan Natarajan; Amit Sharma

arXiv:2602.11202·cs.LO·May 14, 2026

interwhen: A Generalizable Framework for Steering Reasoning Models with Test-time Verification

Vishak K Bhat, Prateek Chanda, Vijval Ekbote, Ashmit Khandelwal, Maitreyi Swaroop, Vineeth N. Balasubramanian, Subbarao Kambhampati, Nagarajan Natarajan, Amit Sharma

PDF

1 Repo 2 Videos

TL;DR

interwhen is a versatile test-time verification framework that guides reasoning models using intermediate feedback and automatically generated verifiers, significantly enhancing accuracy and policy adherence.

Contribution

It introduces a novel, plug-and-play verification system that monitors reasoning traces and synthesizes verifiers from natural language policies, improving model reliability without fine-tuning.

Findings

01

Achieves near-perfect accuracy on reasoning benchmarks with minimal token usage.

02

Improves task completion rates from 32% to 87% on telecom domain benchmarks.

03

Enables policy compliance and correctness in reasoning models through automatic verifier synthesis.

Abstract

Reasoning models produce long traces of intermediate decisions and tool calls, making test-time verification important for ensuring correctness. Existing approaches either verify only the final answer, which misses early errors, or rely on branch-and-verify strategies that explore multiple trajectories. We introduce interwhen, a single-trajectory verification framework that steers model behavior by providing feedback on intermediate reasoning traces. It addresses two key challenges. First, given a set of verifiers, obtaining verifiable states from the reasoning trace typically requires prompt engineering or external task decomposition into fixed steps. Instead, we propose a monitoring system that periodically polls the reasoning trace and forks inference of the reasoning model to recover intermediate states. Verifiers are run asynchronously alongside generation, adding negligible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/interwhen
github

Videos

Introducing Interwhen: Steering reasoning agents with real-time verification· youtube

Microsoft Research Forum | Season 2, Episode 4· youtube