ELIP: Efficient Discriminative Language-Image Pre-training with Fewer Vision Tokens
Yangyang Guo, Haoyu Zhang, Yongkang Wong, Liqiang Nie, Mohan Kankanhalli

TL;DR
ELIP introduces a vision token pruning method for language-image pre-training that reduces computational costs while maintaining performance, enabling faster training and larger batch sizes.
Contribution
The paper presents a novel, task-aligned token pruning approach for efficient language-image pre-training that is computation- and memory-efficient, and parameter-free.
Findings
Removes ~30% vision tokens with minimal performance loss (~0.32 accuracy drop)
Maintains performance across various downstream tasks
Enables larger batch sizes and faster pre-training
Abstract
Learning a versatile language-image model is computationally prohibitive under a limited computing budget. This paper delves into the \emph{efficient language-image pre-training}, an area that has received relatively little attention despite its importance in reducing computational cost and footprint. To that end, we propose a vision token pruning and merging method ELIP, to remove less influential tokens based on the supervision of language outputs. Our method is designed with several strengths, such as being computation-efficient, memory-efficient, and trainable-parameter-free, and is distinguished from previous vision-only token pruning approaches by its alignment with task objectives. We implement this method in a progressively pruning manner using several sequential blocks. To evaluate its generalization performance, we apply ELIP to three commonly used language-image pre-training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
MethodsPruning
