Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models

Xuyang Liu; Yiyu Wang; Junpeng Ma; Linfeng Zhang

arXiv:2505.14454·cs.CV·November 19, 2025

Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models

Xuyang Liu, Yiyu Wang, Junpeng Ma, Linfeng Zhang

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces VidCom2, a plug-and-play framework that adaptively compresses video tokens based on frame uniqueness, significantly accelerating VideoLLMs while maintaining high accuracy.

Contribution

We propose a novel adaptive token compression framework for VideoLLMs that addresses information loss and implementation issues, improving efficiency without sacrificing performance.

Findings

01

Achieves 99.6% of original performance with only 25% tokens.

02

Reduces LLM generation latency by 70.8%.

03

Compatible with other token compression methods.

Abstract

Video large language models (VideoLLM) excel at video understanding, but face efficiency challenges due to the quadratic complexity of abundant visual tokens. Our systematic analysis of token compression methods for VideoLLMs reveals two critical issues: (i) overlooking distinctive visual signals across frames, leading to information loss; (ii) suffering from implementation constraints, causing incompatibility with modern architectures or efficient operators. To address these challenges, we distill three design principles for VideoLLM token compression and propose a plug-and-play inference acceleration framework "Video Compression Commander" (VidCom2). By quantifying each frame's uniqueness, VidCom2 adaptively adjusts compression intensity across frames, effectively preserving essential information while reducing redundancy in video sequences. Extensive experiments across various…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models· underline

Taxonomy

TopicsMachine Learning in Healthcare · AI in cancer detection · Advanced Data Compression Techniques