LRCP: Low-Rank Compressibility Guided Visual Token Pruning for Efficient LVLMs
Hongyu Lu, Feng Zhang, Wenwei Jin, Huanling Hu, Tianjun Shi, Shikai Jiang, Yao Hu, Jiawei Li

TL;DR
LRCP is a novel, training-free visual token pruning method for LVLMs that leverages low-rank structure to efficiently reduce tokens while maintaining high performance.
Contribution
The paper introduces LRCP, a low-rank compressibility based token pruning framework that outperforms existing methods without requiring additional training.
Findings
LRCP preserves 94.7% of image understanding with 88.9% token reduction.
LRCP maintains 97.8% of video understanding accuracy with 87.5% token reduction.
Visual token representations exhibit a stable low-rank structure across models and datasets.
Abstract
Large vision-language models (LVLMs) achieve strong multimodal understanding, but their inference cost grows rapidly with the number of visual tokens, especially for high-resolution images and long videos. Existing attention-based methods estimate token importance from attention scores, which may introduce positional bias, while representation-based methods reduce visual redundancy based on feature relations or reconstruction errors, overlooking the global structure of the visual token set. In this paper, we revisit visual token compression from the perspective of low-rank compressibility. Across models and datasets, we observe that visual token representations exhibit a pronounced low-rank structure, with a dominant subspace that remains stable even after a large fraction of tokens is randomly removed. Motivated by this finding, we propose LRCP, a training-free compression framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
