Rotation-Aligned Key Channel Pruning for Efficient Vision-Language Model Inference

Beomseok Kang; Dongwon Jo; Jiwon Song; Donghwee Son; Jae-Joon Kim

arXiv:2605.19218·cs.CV·May 20, 2026

Rotation-Aligned Key Channel Pruning for Efficient Vision-Language Model Inference

Beomseok Kang, Dongwon Jo, Jiwon Song, Donghwee Son, Jae-Joon Kim

PDF

TL;DR

This paper introduces RotateK, a rotation-based structured key channel pruning method for vision-language models that improves inference efficiency by balancing accuracy and latency under fixed memory constraints.

Contribution

RotateK employs an online PCA-based rotation to enable accurate, hardware-friendly key channel pruning, addressing structural trade-offs in prior methods.

Findings

01

RotateK outperforms prior key channel pruning in accuracy and latency.

02

Joint token-channel pruning surpasses token-only baselines at the same cache budget.

03

Experiments validate effectiveness on two VLM backbones.

Abstract

Vision-Language Models suffer severe KV cache pressure at inference, as a single image often encodes into thousands of tokens. Most existing methods exploit token sparsity through token pruning, but permanently discarding visual content causes substantial degradation on fine-grained perception tasks. This motivates a complementary axis, feature sparsity: under a fixed KV cache budget, compressing the channel dimension preserves more visual tokens at the same memory cost. Prior Key channel pruning methods, however, face a structural trade-off: token-wise channel pruning is expressive but unstructured and slow, while head-wise approach is hardware-friendly but less robust. We resolve this with RotateK, a rotation-based structured Key channel pruning framework. RotateK applies an online PCA-based rotation that aligns token-dependent channel importance into a shared low-dimensional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.