Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language   Models

Chen Ju; Haicheng Wang; Zeqian Li; Xu Chen; Zhonghua Zhai; Weilin; Huang; Shuai Xiao

arXiv:2312.07408·cs.CV·December 13, 2023·1 cites

Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Models

Chen Ju, Haicheng Wang, Zeqian Li, Xu Chen, Zhonghua Zhai, Weilin, Huang, Shuai Xiao

PDF

Open Access

TL;DR

This paper introduces Turbo, a plug-in module that accelerates vision-language models by pruning tokens based on their information content, effectively reducing computation costs while maintaining performance.

Contribution

It pioneers a data-centric approach to model acceleration by designing an information degree-guided token pruning method applicable across various VLMs.

Findings

01

Significant speed-up in VLMs with minimal performance loss

02

Compatible with multiple VLM architectures and tasks

03

Simple plug-in design requiring no retraining

Abstract

Vision-Language Large Models (VLMs) have become primary backbone of AI, due to the impressive performance. However, their expensive computation costs, i.e., throughput and delay, impede potentials in real-world scenarios. To achieve acceleration for VLMs, most existing methods focus on the model perspective: pruning, distillation, quantification, but completely overlook the data-perspective redundancy. To fill the overlook, this paper pioneers the severity of data redundancy, and designs one plug-and-play Turbo module guided by information degree to prune inefficient tokens from visual or textual data. In pursuit of efficiency-performance trade-offs, information degree takes two key factors into consideration: mutual redundancy and semantic value. Concretely, the former evaluates the data duplication between sequential tokens; while the latter evaluates each token by its contribution to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Explainable Artificial Intelligence (XAI)

MethodsFocus