PAR: Prompt-Aware Token Reduction Method for Efficient Large Multimodal Models
Yingen Liu, Fan Wu, Ruihui Li, Zhuo Tang, Kenli Li

TL;DR
PAR is a prompt-aware token reduction method that significantly decreases computational costs of multimodal models by adaptively clustering essential visual tokens, maintaining high accuracy with minimal performance loss.
Contribution
This paper introduces a novel, plug-and-play token reduction approach that leverages prompt-awareness to efficiently reduce visual tokens in multimodal models without extra training.
Findings
Reduces FLOPs by 83% across visual tasks.
Achieves 89% compression ratio while retaining 97% accuracy.
Doubles token reduction compared to prior methods.
Abstract
Multimodal large language models (MLLMs) demonstrate strong performance across visual tasks, but their efficiency is hindered by significant computational and memory demands from processing long contexts in multimodal inputs. To address this, we introduce PAR (Prompt-Aware Token Reduction), a novel and plug-and-play approach that reduces visual tokens efficiently without compromising model performance. Unlike previous methods that rely heavily on attention mechanisms and overlooking cross-modal interactions , we uses a prompt-aware strategy to adpative identify and cluster essential visual tokens. PAR categorizes visual context redundancy into two types: external and internal. External redundancy is minimized through semantic retrieval, while internal redundancy is addressed using a token routing mechanism. This method substantially reduces computational load without requiring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Video Analysis and Summarization
MethodsSoftmax · Attention Is All You Need
