DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training
Yihao Chen, Xianbiao Qi, Jianan Wang, Lei Zhang

TL;DR
DisCo-CLIP introduces a distributed, memory-efficient contrastive loss computation method for CLIP training, significantly reducing GPU memory usage and enabling large-batch training on fewer GPUs without loss of accuracy.
Contribution
It decomposes contrastive loss into intra-GPU and inter-GPU parts, reducing memory from O(B^2) to O(B^2/N) and enabling efficient large-batch CLIP training.
Findings
Reduces GPU memory consumption for contrastive loss
Enables training with larger batch sizes on fewer GPUs
Maintains mathematical equivalence to original contrastive loss
Abstract
We propose DisCo-CLIP, a distributed memory-efficient CLIP training approach, to reduce the memory consumption of contrastive loss when training contrastive learning models. Our approach decomposes the contrastive loss and its gradient computation into two parts, one to calculate the intra-GPU gradients and the other to compute the inter-GPU gradients. According to our decomposition, only the intra-GPU gradients are computed on the current GPU, while the inter-GPU gradients are collected via all_reduce from other GPUs instead of being repeatedly computed on every GPU. In this way, we can reduce the GPU memory consumption of contrastive loss computation from to , where and are the batch size and the number of GPUs used for training. Such a distributed solution is mathematically equivalent to the original non-distributed contrastive loss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsContrastive Language-Image Pre-training · Contrastive Learning
