EPIC: Efficient Position-Independent Caching for Serving Large Language Models
Junhao Hu, Wenrui Huang, Weidong Wang, Haoyi Wang, Tiancheng Hu, Qin Zhang, Hao Feng, Xusheng Chen, Yizhou Shan, Tao Xie

TL;DR
EPIC introduces a novel position-independent caching system for large language models, enabling more efficient reuse of intermediate representations across requests with different prefixes, significantly improving serving speed and throughput.
Contribution
The paper presents EPIC, a new serving system with the LegoLink algorithm that enhances context caching by enabling prefix-agnostic reuse of KV vectors, addressing limitations of prior exact prefix matching methods.
Findings
EPIC achieves up to 8x reduction in Time-To-First-Token.
EPIC attains 7x throughput gains over existing systems.
EPIC maintains accuracy with minimal or no loss.
Abstract
Large Language Models (LLMs) show great capabilities in a wide range of applications, but serving them efficiently becomes increasingly challenging as requests (prompts) become more complex. Context caching improves serving performance by reusing Key-Value (KV) vectors, the intermediate representations of tokens that are repeated across requests. However, existing context caching requires exact prefix matches across requests, limiting reuse cases in settings such as few-shot learning and retrieval-augmented generation, where immutable content (e.g., documents) remains unchanged across requests but is preceded by varying prefixes. Position-Independent Caching (PIC) addresses this issue by enabling modular reuse of the KV vectors regardless of prefixes. We formalize PIC and advance prior work by introducing EPIC, a serving system incorporating our new LegoLink algorithm, which mitigates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Recommender Systems and Techniques · Speech and dialogue systems
