StepCache: Step-Level Reuse with Lightweight Verification and Selective Patching for LLM Serving

Azam Nouri

arXiv:2603.28795·cs.OS·April 1, 2026

StepCache: Step-Level Reuse with Lightweight Verification and Selective Patching for LLM Serving

Azam Nouri

PDF

TL;DR

StepCache is a step-level reuse framework for LLM serving that improves efficiency and correctness by verifying and selectively patching cached steps, especially for structured outputs like JSON and linear equations.

Contribution

It introduces a backend-agnostic, step-level reuse layer with lightweight verification and patching, enhancing LLM serving performance and correctness over prior caching methods.

Findings

01

Reduces mean latency from 2.13 s to 0.67 s

02

Achieves 100% correctness with verification and patching

03

79.7% of requests take the fast reuse-only path

Abstract

We address LLM serving workloads where repeated requests share a common solution structure but differ in localized constraints, such as output schema, variable names, or numeric constants. Prior caching approaches typically reuse either full responses (semantic caching) or model-internal KV/prefix states, which are respectively brittle under partial changes or tightly coupled to specific backends. We present StepCache, a backend-agnostic step-level reuse layer that segments outputs into ordered steps, retrieves the best-matching cached request, verifies steps using lightweight task-aware checks, and regenerates only failing regions via selective patching. StepCache additionally supports strict structured-output enforcement for JSON, including single-step extraction, required-key constraints, and one-shot repair, as well as conservative skip-reuse fallbacks for semantic changes. For…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.