Mitigating Information Loss under High Pruning Rates for Efficient Large Vision Language Models
Mingyu Fu, Wei Suo, Ji Ma, Lin Yuanbo Wu, Peng Wang, Yanning Zhang

TL;DR
This paper introduces ACCM, a method that uses adaptive captioning to preserve visual information in large vision language models, enabling high pruning rates with minimal performance loss and reduced computational costs.
Contribution
The paper proposes ACCM, a novel adaptive content compensation technique that employs self-supervised captioning and selection to mitigate information loss during pruning in LVLMs.
Findings
ACCM outperforms existing methods across seven benchmarks.
Achieves 20.6% higher accuracy with 6.5% fewer FLOPs.
Effectively preserves visual information at high pruning rates.
Abstract
Despite the great success of Large Vision Language Models (LVLMs), their high computational cost severely limits their broad applications. The computational cost of LVLMs mainly stems from the visual sequence of the input, which consists of hundreds or even thousands of tokens. Although existing methods have made progress by removing redundant tokens, they suffer from severe performance degradation with high pruning rates due to the loss of visual information. In this paper, we propose an Adaptive Content Compensation Method (ACCM), which can effectively mitigate the visual information loss via an image caption. Specifically, ACCM comprises two key components: a lightweight caption model and a selector. Firstly the caption model generates question-related descriptions under the guidance of the user instruction. Then the selector further identifies a contextually appropriate caption from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
