MEPIC: Memory Efficient Position Independent Caching for LLM Serving

Qian Wang; Zahra Yousefijamarani; Morgan Lindsay Heisler; Rongzhi Gu; Bai Xiaolong; Shan Yizhou; Wei Zhang; Wang Lan; Ying Xiong; Yong Zhang; Zhenan Fan

arXiv:2512.16822·cs.LG·December 19, 2025

MEPIC: Memory Efficient Position Independent Caching for LLM Serving

Qian Wang, Zahra Yousefijamarani, Morgan Lindsay Heisler, Rongzhi Gu, Bai Xiaolong, Shan Yizhou, Wei Zhang, Wang Lan, Ying Xiong, Yong Zhang, Zhenan Fan

PDF

Open Access

TL;DR

MEPIC introduces a memory-efficient position-independent caching system for large language model serving, significantly reducing memory usage and improving efficiency without altering the model.

Contribution

It proposes novel techniques for chunk KV reuse across positions and requests, aligning KV to memory pages and removing positional encodings to optimize memory consumption.

Findings

01

Reduces KV cache memory usage by up to 2x compared to state-of-the-art.

02

Achieves up to 5x memory savings for long prompts.

03

Maintains comparable latency and accuracy without model modifications.

Abstract

Modern LLM applications such as deep-research assistants, coding agents, and Retrieval-Augmented Generation (RAG) systems, repeatedly process long prompt histories containing shared document or code chunks, creating significant pressure on the Key Value (KV) cache, which must operate within limited memory while sustaining high throughput and low latency. Prefix caching partially alleviates some of these costs by reusing KV cache for previously processed tokens, but limited by strict prefix matching. Position-independent caching (PIC) enables chunk-level reuse at arbitrary positions, but requires selective recomputation and positional-encoding (PE) adjustments. However, because these operations vary across queries, KV for the same chunk diverges across requests. Moreover, without page alignment, chunk KV layouts diverge in memory, preventing page sharing. These issues result in only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCaching and Content Delivery · Advanced Data Storage Technologies · Distributed systems and fault tolerance