EVA-CLIP: Improved Training Techniques for CLIP at Scale

Quan Sun; Yuxin Fang; Ledell Wu; Xinlong Wang; Yue Cao

arXiv:2303.15389·cs.CV·March 28, 2023·80 cites

EVA-CLIP: Improved Training Techniques for CLIP at Scale

Quan Sun, Yuxin Fang, Ledell Wu, Xinlong Wang, Yue Cao

PDF

Open Access 4 Repos 10 Models

TL;DR

EVA-CLIP introduces new training techniques that significantly enhance the efficiency and performance of CLIP models, achieving high accuracy with fewer resources and enabling broader accessibility for research.

Contribution

The paper presents EVA-CLIP, a set of improved training methods that boost CLIP's effectiveness and efficiency, with state-of-the-art results at scale.

Findings

01

Achieves 82.0% zero-shot top-1 accuracy on ImageNet-1K with 5.0B parameters.

02

Smaller model attains 80.4% accuracy with fewer parameters and samples.

03

Significantly reduces training costs while maintaining high performance.

Abstract

Contrastive language-image pre-training, CLIP for short, has gained increasing attention for its potential in various scenarios. In this paper, we propose EVA-CLIP, a series of models that significantly improve the efficiency and effectiveness of CLIP training. Our approach incorporates new techniques for representation learning, optimization, and augmentation, enabling EVA-CLIP to achieve superior performance compared to previous CLIP models with the same number of parameters but significantly smaller training costs. Notably, our largest 5.0B-parameter EVA-02-CLIP-E/14+ with only 9 billion seen samples achieves 82.0 zero-shot top-1 accuracy on ImageNet-1K val. A smaller EVA-02-CLIP-L/14+ with only 430 million parameters and 6 billion seen samples achieves 80.4 zero-shot top-1 accuracy on ImageNet-1K val. To facilitate open access and open research, we release the complete suite of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Cancer-related molecular mechanisms research

MethodsContrastive Language-Image Pre-training