freePruner: A Training-free Approach for Large Multimodal Model Acceleration
Bingxin Xu, Yuzhang Shang, Yunhao Ge, Qian Lou, Yan Yan

TL;DR
freePruner is a novel training-free token reduction method for large multimodal models that accelerates inference by approximately 2x without retraining, maintaining performance on visual question-answering tasks.
Contribution
It introduces a training-free, two-stage token selection strategy for LMM acceleration, avoiding retraining and enabling easy integration with other optimization techniques.
Findings
Achieves 2x speedup on VQA benchmarks
Maintains comparable performance without retraining
Can be combined with quantization techniques
Abstract
Large Multimodal Models (LMMs) have demonstrated impressive capabilities in visual-language tasks but face significant deployment challenges due to their high computational demands. While recent token reduction methods show promise for accelerating LMMs, they typically require extensive retraining or fine-tuning, making them impractical for many state-of-the-art models, especially those with proprietary training data. We propose freePruner, a training-free token reduction approach that can be directly applied to any open-source LMM without additional training. Unlike existing methods that rely heavily on token merging operations, freePruner employs a two-stage token selection strategy: (1) identifying pivotal tokens that capture high-level semantic information using our designed contribution degree metric, and (2) selecting complementary tokens that preserve essential low-level visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Advanced Data Processing Techniques · Anomaly Detection Techniques and Applications
MethodsSoftmax · Attention Is All You Need
