Efficient Vision-Language Reasoning via Adaptive Token Pruning

Xue Li; Xiaonan Song; Henry Hu

arXiv:2512.12701·cs.CV·December 16, 2025

Efficient Vision-Language Reasoning via Adaptive Token Pruning

Xue Li, Xiaonan Song, Henry Hu

PDF

Open Access

TL;DR

This paper presents Adaptive Token Pruning (ATP), a dynamic method that reduces computational costs in vision-language models by selectively retaining the most relevant tokens, leading to faster inference with minimal accuracy loss.

Contribution

The paper introduces ATP, a novel, input-adaptive token pruning mechanism that improves efficiency without altering the backbone architecture of existing vision-language models.

Findings

01

Reduces inference FLOPs by ~40%.

02

Achieves ~1.5x speedup in latency.

03

Maintains accuracy with less than 1% loss.

Abstract

Real-world deployment of Vision-Language Models (VLMs) is hindered by high computational demands, as existing architectures inefficiently process all tokens uniformly. We introduce Adaptive Token Pruning (ATP), a dynamic inference mechanism that retains only the most informative tokens based on contextual relevance. ATP operates at the vision-language interface, assigning a hybrid importance score combining ViT CLS attention (intra-modal saliency) and CLIP text-image similarity (inter-modal relevance) to keep top-K tokens for the LLM. Unlike static compression, ATP adapts to each input without modifying the backbone. Proposed as a lightweight gating module, ATP is compatible with popular backbones like BLIP-2, LLaVA, and Flamingo. Preliminary evaluations across VQAv2, GQA, and COCO indicate that ATP reduces inference FLOPs by around 40% and achieves roughly 1.5x speedups in end-to-end…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Topic Modeling