LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs

Boyuan Sun; Jiaxing Zhao; Xihan Wei; Qibin Hou

arXiv:2506.21862·cs.CV·June 30, 2025

LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs

Boyuan Sun, Jiaxing Zhao, Xihan Wei, Qibin Hou

PDF

Open Access 1 Repo 2 Models

TL;DR

LLaVA-Scissor introduces a semantic connected components-based token compression method for video LLMs, effectively reducing tokens while maintaining semantic coverage and improving performance on various video understanding tasks.

Contribution

It proposes a novel, training-free token compression strategy using semantic connected components for better semantic coverage in video LLMs.

Findings

01

Outperforms existing token compression methods in benchmarks.

02

Maintains high performance at low token retention ratios.

03

Effective in diverse video understanding tasks.

Abstract

In this paper, we present LLaVA-Scissor, a training-free token compression strategy designed for video multimodal large language models. Previous methods mostly attempt to compress tokens based on attention scores, but fail to effectively capture all semantic regions and often lead to token redundancy. Differently, we propose to leverage the Semantic Connected Components (SCC) approach that assigns tokens to distinct semantic regions within the token set, ensuring comprehensive semantic coverage. The outcome is a two-step spatio-temporal token compression strategy that utilizes SCC in both spatial and temporal domains. This strategy can effectively compress tokens by representing the entire video with a set of non-overlapping semantic tokens. We conduct extensive evaluations of the token compression capabilities of LLaVA-Scissor across diverse video understanding benchmarks, including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HumanMLLM/LLaVA-Scissor
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning

MethodsSparse Evolutionary Training