Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering

Manan Gupta; Dhruv Kumar

arXiv:2604.18567·cs.LG·April 21, 2026

Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering

Manan Gupta, Dhruv Kumar

PDF

TL;DR

The paper introduces Latent Phase-Shift Rollback (LPSR), a novel inference-time error correction method for large language models that detects and corrects reasoning errors without fine-tuning, significantly improving accuracy.

Contribution

LPSR is the first method to perform inference-time error correction via residual stream monitoring and KV-cache steering without additional training or passes.

Findings

01

LPSR improves MATH-500 accuracy from 28.8% to 44.0%.

02

LPSR outperforms prompted self-correction and other baselines.

03

Detection and correction layers differ, with optimal detection at layer 14 and correction at layer 16.

Abstract

Large language models frequently commit unrecoverable reasoning errors mid-generation: once a wrong step is taken, subsequent tokens compound the mistake rather than correct it. We introduce $Latent Phase-Shift Rollback$ (LPSR): at each generation step, we monitor the residual stream at a critical layer lcrit, detect abrupt directional reversals (phase shifts) via a cosine-similarity $+$ entropy dual gate, and respond by rolling back the KV-cache and injecting a pre-computed steering vector. No fine-tuning, gradient computation, or additional forward passes are required. LPSR achieves $44.0%$ on MATH-500 with an 8B model versus $28.8%$ for standard AR ( $+ 15.2$ pp; McNemar $χ^{2} = 66.96$ , $p < 1 0^{- 15}$ ). Critically, prompted self-correction, the most natural inference-time baseline, scores only $19.8%$ , below standard AR; LPSR exceeds it by $+ 24.2$ pp ($\chi^2 =…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.