Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language   Large Models

Chen Ju; Haicheng Wang; Haozhe Cheng; Xu Chen; Zhonghua Zhai; Weilin; Huang; Jinsong Lan; Shuai Xiao; Bo Zheng

arXiv:2407.11717·cs.CV·July 17, 2024

Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models

Chen Ju, Haicheng Wang, Haozhe Cheng, Xu Chen, Zhonghua Zhai, Weilin, Huang, Jinsong Lan, Shuai Xiao, Bo Zheng

PDF

Open Access 1 Repo

TL;DR

This paper introduces Turbo, a plug-in module that accelerates vision-language models by pruning tokens based on their information content, effectively reducing computation costs while maintaining performance.

Contribution

The paper pioneers a data-centric approach to model acceleration by designing an information degree-guided token pruning method applicable across various VLMs.

Findings

01

Turbo achieves significant speed-up with negligible performance loss.

02

The method is compatible with multiple VLM architectures.

03

It requires no re-training or complex engineering.

Abstract

Vision-Language Large Models (VLMs) recently become primary backbone of AI, due to the impressive performance. However, their expensive computation costs, i.e., throughput and delay, impede potentials in the real-world scenarios. To achieve acceleration for VLMs, most existing methods focus on the model perspective: pruning, distillation, quantization, but completely overlook the data-perspective redundancy. To fill the overlook, this paper pioneers the severity of data redundancy, and designs one plug-and-play Turbo module guided by information degree to prune inefficient tokens from visual or textual data. In pursuit of efficiency-performance trade-offs, information degree takes two crucial factors into consideration: mutual redundancy and semantic value. Concretely, the former evaluates data duplication between sequential tokens; while the latter evaluates each token by its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anakin-skywalker-joseph/folder
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications

MethodsFocus