UniCLIP: Unified Framework for Contrastive Language-Image Pre-training
Janghyeon Lee, Jongsuk Kim, Hyounguk Shon, Bumsoo Kim, Seung Hwan Kim,, Honglak Lee, Junmo Kim

TL;DR
UniCLIP introduces a unified contrastive learning framework that combines inter-domain and intra-domain losses into a single space, improving vision-language pre-training effectiveness across multiple tasks.
Contribution
It proposes a novel unified framework with key components to integrate different contrastive losses, enhancing data efficiency and transferability in vision-language models.
Findings
Outperforms previous methods on various downstream tasks.
Each component of UniCLIP significantly improves performance.
Unified contrastive loss effectively combines inter- and intra-domain learning.
Abstract
Pre-training vision-language models with contrastive objectives has shown promising results that are both scalable to large uncurated datasets and transferable to many downstream applications. Some following works have targeted to improve data efficiency by adding self-supervision terms, but inter-domain (image-text) contrastive loss and intra-domain (image-image) contrastive loss are defined on individual spaces in those works, so many feasible combinations of supervision are overlooked. To overcome this issue, we propose UniCLIP, a Unified framework for Contrastive Language-Image Pre-training. UniCLIP integrates the contrastive loss of both inter-domain pairs and intra-domain pairs into a single universal space. The discrepancies that occur when integrating contrastive loss between different domains are resolved by the three key components of UniCLIP: (1) augmentation-aware feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Cancer-related molecular mechanisms research
