PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation

Ao Wang; Hui Chen; Jiaxin Li; Jianchao Tan; Kefeng Zhang; Xunliang Cai; Zijia Lin; Jungong Han; Guiguang Ding

arXiv:2412.03409·cs.CV·October 20, 2025

PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation

Ao Wang, Hui Chen, Jiaxin Li, Jianchao Tan, Kefeng Zhang, Xunliang Cai, Zijia Lin, Jungong Han, Guiguang Ding

PDF

Open Access 1 Repo 1 Video

TL;DR

PrefixKV introduces an adaptive, importance-based prefix key-value cache mechanism for vision-language models, significantly improving inference efficiency and generation quality by preserving essential contextual information across layers.

Contribution

It proposes a novel importance-based prefix KV cache method with adaptive layer-wise retention, optimizing information preservation and inference efficiency in vision-language models.

Findings

01

Achieves state-of-the-art performance in inference efficiency and quality.

02

Demonstrates superior trade-offs between efficiency and generation quality.

03

Shows promising potential for practical deployment of LVLMs.

Abstract

Recently, large vision-language models (LVLMs) have rapidly gained popularity for their strong generation and reasoning capabilities given diverse multimodal inputs. However, these models incur significant computational and memory overhead during inference, which greatly hinders the efficient deployment in practical scenarios. The extensive key-value (KV) cache, necessitated by the lengthy input and output sequences, notably contributes to the high inference cost. Based on this, recent works have investigated ways to reduce the KV cache size for higher efficiency. Although effective, they generally overlook the distinct importance distributions of KV vectors across layers and maintain the same cache size for each layer during the next token prediction. This results in the significant contextual information loss for certain layers, leading to notable performance decline. To address this,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

THU-MIG/PrefixKV
pytorchOfficial

Videos

PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation· slideslive

Taxonomy

TopicsAlgorithms and Data Compression · Natural Language Processing Techniques · Genomics and Phylogenetic Studies