Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior

Yulin Li; Haokun Gui; Ziyang Fan; Junjie Wang; Bin Kang; Bin Chen; Zhuotao Tian

arXiv:2512.06866·cs.CV·December 9, 2025

Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior

Yulin Li, Haokun Gui, Ziyang Fan, Junjie Wang, Bin Kang, Bin Chen, Zhuotao Tian

PDF

Open Access

TL;DR

DyToK is a training-free, LLM-guided dynamic token compression method that improves the efficiency of video understanding models by selectively retaining semantically rich frames, achieving significant speedups without accuracy loss.

Contribution

This work introduces DyToK, a novel attention-based, training-free token compression approach that dynamically adjusts token retention based on LLM attention, outperforming existing methods in efficiency-accuracy tradeoffs.

Findings

01

Achieves 4.3x faster inference speed.

02

Maintains accuracy across multiple VLLMs.

03

Compatible with existing compression methods.

Abstract

Recent advances in Video Large Language Models (VLLMs) have achieved remarkable video understanding capabilities, yet face critical efficiency bottlenecks due to quadratic computational growth with lengthy visual token sequences of long videos. While existing keyframe sampling methods can improve temporal modeling efficiency, additional computational cost is introduced before feature encoding, and the binary frame selection paradigm is found suboptimal. Therefore, in this work, we propose Dynamic Token compression via LLM-guided Keyframe prior (DyToK), a training-free paradigm that enables dynamic token compression by harnessing VLLMs' inherent attention mechanisms. Our analysis reveals that VLLM attention layers naturally encoding query-conditioned keyframe priors, by which DyToK dynamically adjusts per-frame token retention ratios, prioritizing semantically rich frames while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning