Fit and Prune: Fast and Training-free Visual Token Pruning for   Multi-modal Large Language Models

Weihao Ye; Qiong Wu; Wenhao Lin; Yiyi Zhou

arXiv:2409.10197·cs.CV·December 30, 2024

Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models

Weihao Ye, Qiong Wu, Wenhao Lin, Yiyi Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces FitPrune, a training-free method for visual token pruning in multimodal large language models that efficiently reduces computation while maintaining performance.

Contribution

It proposes a novel statistical approach to determine optimal token pruning schemes based on attention statistics, avoiding expensive retraining.

Findings

01

Reduces FLOPs by up to 54.9% with minimal accuracy loss

02

Can generate pruning recipes in about 5 minutes

03

Effective across multiple recent MLLMs

Abstract

Recent progress in Multimodal Large Language Models(MLLMs) often use large image tokens to compensate the visual shortcoming of MLLMs, which not only exhibits obvious redundancy but also greatly exacerbates the already high computation. Token pruning is an effective solution for speeding up MLLMs, but when and how to drop tokens still remains a challenge. In this paper, we propose a novel and training-free approach for the effective visual token pruning of MLLMs, termed FitPrune, which can quickly produce a complete pruning recipe for MLLMs according to a pre-defined budget. Specifically, FitPrune considers token pruning as a statistical problem of MLLM and its objective is to find out an optimal pruning scheme that can minimize the divergence of the attention distributions before and after pruning. In practice, FitPrune can be quickly accomplished based on the attention statistics from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ywh187/fitprune
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training · Pruning