You Need an Encoder for Native Position-Independent Caching

Shiju Zhao; Junhao Hu; Jiaqi Zheng; Guihai Chen

arXiv:2602.01519·cs.LG·February 3, 2026

You Need an Encoder for Native Position-Independent Caching

Shiju Zhao, Junhao Hu, Jiaqi Zheng, Guihai Chen

PDF

Open Access

TL;DR

This paper introduces native position-independent caching (PIC) for decoder-only LLMs by reintroducing and training an encoder, significantly improving inference speed and throughput while maintaining accuracy.

Contribution

It proposes a novel native PIC method with an encoder for decoder-only LLMs and develops COMB, a PIC-aware caching system that enhances inference efficiency.

Findings

01

Reduces Time-to-First-Token by 51-94%

02

Triples throughput during inference

03

Maintains comparable accuracy with existing methods

Abstract

The Key-Value (KV) cache of Large Language Models (LLMs) is prefix-based, making it highly inefficient for processing contexts retrieved in arbitrary order. Position-Independent Caching (PIC) has been proposed to enable KV reuse without positional constraints; however, existing approaches often incur substantial accuracy degradation, limiting their practical adoption. To address this issue, we propose native PIC by reintroducing the encoder to prevalent decoder-only LLMs and explicitly training it to support PIC. We further develop COMB, a PIC-aware caching system that integrates seamlessly with existing inference frameworks. Experimental results show that COMB reduces Time-to-First-Token (TTFT) by 51-94% and increases throughput by 3 $\times$ with comparable accuracy. Furthermore, the quality improvement when using DeepSeek-V2-Lite-Chat demonstrates the applicability of COMB to other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCaching and Content Delivery · Data Quality and Management · Advanced Neural Network Applications