CAPA: Contribution-Aware Pruning and FFN Approximation for Efficient Large Vision-Language Models
Samyak Jha, Junho Kim

TL;DR
CAPA introduces a novel framework for pruning visual tokens and approximating FFN computations in large vision-language models, leading to improved efficiency and robustness without significant performance loss.
Contribution
It proposes Attention Contribution as a better criterion for token importance and introduces a dual strategy for token pruning and FFN approximation in vision-language models.
Findings
CAPA achieves significant efficiency gains with minimal performance loss.
Attention Contribution outperforms traditional attention scores in token importance estimation.
Redundancy in FFNs allows for effective linear approximations.
Abstract
Efficient inference in Large Vision-Language Models is constrained by the high cost of processing thousands of visual tokens, yet it remains unclear which tokens and computations can be safely removed. While attention scores are commonly used to estimate visual token importance, they are an imperfect proxy for actual contribution. We show that Attention Contribution, which weights attention probabilities by value vector magnitude, provides a more accurate criterion for visual token selection. Our empirical analysis reveals that visual attention sinks are functionally heterogeneous, comprising Probability Dumps with low contribution that can be safely pruned, and Structural Anchors with high contribution essential for maintaining model performance. Further, we identify substantial redundancy in Feed-Forward Networks (FFNs) associated with visual tokens, particularly in intermediate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
