FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources
Xiyuan Wei, Fanjiang Ye, Ori Yonay, Xingyu Chen, Baixi Sun, Dingwen, Tao, Tianbao Yang

TL;DR
FastCLIP introduces a suite of optimization techniques tailored for resource-limited CLIP training, significantly reducing computational requirements while maintaining performance, through advanced compositional optimization, efficient communication strategies, and optimized training schedules.
Contribution
The paper presents FastCLIP, a novel training framework that enhances CLIP training efficiency on limited resources by integrating advanced optimization techniques and communication strategies.
Findings
FastCLIP achieves comparable or better performance than state-of-the-art methods on limited GPU setups.
The optimized training schedule and parameter update rules significantly improve training speed.
FastCLIP demonstrates scalability across different data sizes and compute resources.
Abstract
Existing studies of training state-of-the-art Contrastive Language-Image Pretraining (CLIP) models on large-scale data involve hundreds of or even thousands of GPUs due to the requirement of a large batch size. However, such a large amount of resources is not accessible to most people. While advanced compositional optimization techniques for optimizing global contrastive losses have been demonstrated effective for removing the requirement of large batch size, their performance on large-scale data remains underexplored and not optimized. To bridge the gap, this paper explores several aspects of CLIP training with limited resources (e.g., up to tens of GPUs). First, we introduce FastCLIP, a general CLIP training framework built on advanced compositional optimization techniques while designed and optimized for the distributed setting. Our framework is equipped with an efficient gradient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Technology and Assessment
MethodsContrastive Language-Image Pre-training
