AMS-KV: Adaptive KV Caching in Multi-Scale Visual Autoregressive Transformers

Boxun Xu; Yu Wang; Zihu Wang; Peng Li

arXiv:2511.16047·cs.CV·November 21, 2025

AMS-KV: Adaptive KV Caching in Multi-Scale Visual Autoregressive Transformers

Boxun Xu, Yu Wang, Zihu Wang, Peng Li

PDF

Open Access 1 Video

TL;DR

This paper introduces AMS-KV, a novel adaptive key-value caching strategy for multi-scale visual autoregressive transformers that significantly reduces memory usage and latency while maintaining high generation quality.

Contribution

The paper proposes AMS-KV, a scale-adaptive KV caching policy tailored for next-scale prediction in VAR models, addressing memory growth and efficiency challenges.

Findings

01

Reduces KV cache usage by up to 84.83%.

02

Lowers self-attention latency by 60.48%.

03

Enables stable scaling to larger batch sizes.

Abstract

Visual autoregressive modeling (VAR) via next-scale prediction has emerged as a scalable image generation paradigm. While Key and Value (KV) caching in large language models (LLMs) has been extensively studied, next-scale prediction presents unique challenges, and KV caching design for next-scale based VAR transformers remains largely unexplored. A major bottleneck is the excessive KV memory growth with the increasing number of scales-severely limiting scalability. Our systematic investigation reveals that: (1) Attending to tokens from local scales significantly contributes to generation quality (2) Allocating a small amount of memory for the coarsest scales, termed as condensed scales, stabilizes multi-scale image generation (3) Strong KV similarity across finer scales is predominantly observed in cache-efficient layers, whereas cache-demanding layers exhibit weaker inter-scale…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AMS-KV: Adaptive KV Caching in Multi-Scale Visual Autoregressive Transformers· underline

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Caching and Content Delivery