ELIP: Efficient Discriminative Language-Image Pre-training with Fewer Vision Tokens

Yangyang Guo; Haoyu Zhang; Yongkang Wong; Liqiang Nie; Mohan Kankanhalli

arXiv:2309.16738·cs.CV·January 27, 2026·1 cites

ELIP: Efficient Discriminative Language-Image Pre-training with Fewer Vision Tokens

Yangyang Guo, Haoyu Zhang, Yongkang Wong, Liqiang Nie, Mohan Kankanhalli

PDF

Open Access 1 Repo

TL;DR

ELIP introduces a vision token pruning method for language-image pre-training that reduces computational costs while maintaining performance, enabling faster training and larger batch sizes.

Contribution

The paper presents a novel, task-aligned token pruning approach for efficient language-image pre-training that is computation- and memory-efficient, and parameter-free.

Findings

01

Removes ~30% vision tokens with minimal performance loss (~0.32 accuracy drop)

02

Maintains performance across various downstream tasks

03

Enables larger batch sizes and faster pre-training

Abstract

Learning a versatile language-image model is computationally prohibitive under a limited computing budget. This paper delves into the \emph{efficient language-image pre-training}, an area that has received relatively little attention despite its importance in reducing computational cost and footprint. To that end, we propose a vision token pruning and merging method ELIP, to remove less influential tokens based on the supervision of language outputs. Our method is designed with several strengths, such as being computation-efficient, memory-efficient, and trainable-parameter-free, and is distinguished from previous vision-only token pruning approaches by its alignment with task objectives. We implement this method in a progressively pruning manner using several sequential blocks. To evaluate its generalization performance, we apply ELIP to three commonly used language-image pre-training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

guoyang9/elip
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling

MethodsPruning