Misalign, Contrast then Distill: Rethinking Misalignments in Language-Image Pretraining
Bumsoo Kim, Yeonsik Jo, Jinhyung Kim, Seung Hwan Kim

TL;DR
This paper introduces MCD, a novel metric learning approach that leverages image-text misalignments caused by augmentations to improve vision-language pretraining, achieving state-of-the-art results.
Contribution
MCD uniquely exploits image-text misalignments during training as a source of information, enhancing data efficiency and transferability in contrastive pretraining.
Findings
MCD outperforms previous methods on multiple downstream tasks.
It effectively predicts misalignment scales caused by augmentations.
Achieves state-of-the-art transferability in classification and retrieval.
Abstract
Contrastive Language-Image Pretraining has emerged as a prominent approach for training vision and text encoders with uncurated image-text pairs from the web. To enhance data-efficiency, recent efforts have introduced additional supervision terms that involve random-augmented views of the image. However, since the image augmentation process is unaware of its text counterpart, this procedure could cause various degrees of image-text misalignments during training. Prior methods either disregarded this discrepancy or introduced external models to mitigate the impact of misalignments during training. In contrast, we propose a novel metric learning approach that capitalizes on these misalignments as an additional training source, which we term "Misalign, Contrast then Distill (MCD)". Unlike previous methods that treat augmented images and their text counterparts as simple positive pairs, MCD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Misalign, Contrast then Distill: Rethinking Misalignments in Language-Image Pre-training· youtube
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI
