ContextPilot: Fast Long-Context Inference via Context Reuse

Yinsicheng Jiang; Yeqi Huang; Liang Cheng; Cheng Deng; Xuan Sun; Luo Mai

arXiv:2511.03475·cs.LG·May 7, 2026

ContextPilot: Fast Long-Context Inference via Context Reuse

Yinsicheng Jiang, Yeqi Huang, Liang Cheng, Cheng Deng, Xuan Sun, Luo Mai

PDF

1 Repo

TL;DR

ContextPilot is a system that accelerates long-context inference in LLMs by reusing overlapping context blocks, reducing latency up to 3 times while maintaining or improving reasoning quality.

Contribution

It introduces a novel context reuse mechanism with a context index, ordering, de-duplication, and annotations to enhance speed without sacrificing reasoning quality.

Findings

01

Reduces LLM prefill latency by up to 3x

02

Preserves reasoning quality during context reuse

03

Can improve reasoning quality at longer context lengths

Abstract

AI applications increasingly depend on long-context inference, where LLMs consume substantial context to support stronger reasoning. Common examples include retrieval-augmented generation, agent memory layers, and multi-agent orchestration. As input contexts get longer, prefill latency becomes the main bottleneck. Yet today's prefill acceleration techniques face a trade-off: they either preserve reasoning quality but deliver little KV-cache reuse, or improve reuse at the cost of degraded reasoning quality. We present ContextPilot, a system that accelerates prefill by introducing context reuse as a new mechanism for faster long-context inference. ContextPilot introduces a context index to identify overlapping context blocks across LLM interactions (e.g., across users and turns). It further proposes context ordering and de-duplication techniques to maximize KV-cache reuse. To preserve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

EfficientContext/ContextPilot
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.