DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP   Training

Yihao Chen; Xianbiao Qi; Jianan Wang; Lei Zhang

arXiv:2304.08480·cs.CV·April 18, 2023·1 cites

DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training

Yihao Chen, Xianbiao Qi, Jianan Wang, Lei Zhang

PDF

Open Access 1 Repo

TL;DR

DisCo-CLIP introduces a distributed, memory-efficient contrastive loss computation method for CLIP training, significantly reducing GPU memory usage and enabling large-batch training on fewer GPUs without loss of accuracy.

Contribution

It decomposes contrastive loss into intra-GPU and inter-GPU parts, reducing memory from O(B^2) to O(B^2/N) and enabling efficient large-batch CLIP training.

Findings

01

Reduces GPU memory consumption for contrastive loss

02

Enables training with larger batch sizes on fewer GPUs

03

Maintains mathematical equivalence to original contrastive loss

Abstract

We propose DisCo-CLIP, a distributed memory-efficient CLIP training approach, to reduce the memory consumption of contrastive loss when training contrastive learning models. Our approach decomposes the contrastive loss and its gradient computation into two parts, one to calculate the intra-GPU gradients and the other to compute the inter-GPU gradients. According to our decomposition, only the intra-GPU gradients are computed on the current GPU, while the inter-GPU gradients are collected via all_reduce from other GPUs instead of being repeatedly computed on every GPU. In this way, we can reduce the GPU memory consumption of contrastive loss computation from $\bigO (B^{2})$ to $\bigO (\frac{B ^{2}}{N})$ , where $B$ and $N$ are the batch size and the number of GPUs used for training. Such a distributed solution is mathematically equivalent to the original non-distributed contrastive loss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

idea-research/disco-clip
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsContrastive Language-Image Pre-training · Contrastive Learning