Test-Time Distribution Normalization for Contrastively Learned   Vision-language Models

Yifei Zhou; Juntao Ren; Fengyu Li; Ramin Zabih; Ser-Nam Lim

arXiv:2302.11084·cs.LG·October 20, 2023·1 cites

Test-Time Distribution Normalization for Contrastively Learned Vision-language Models

Yifei Zhou, Juntao Ren, Fengyu Li, Ramin Zabih, Ser-Nam Lim

PDF

Open Access 2 Repos

TL;DR

This paper introduces Distribution Normalization (DN), a test-time method that aligns inference with the training objective of contrastive vision-language models like CLIP, improving performance without retraining.

Contribution

It proposes a novel test-time normalization technique that captures negative sample information during inference, enhancing model accuracy across tasks.

Findings

01

DN outperforms standard dot product in various tasks

02

No retraining or fine-tuning required for DN

03

Extensive experiments validate DN's effectiveness

Abstract

Advances in the field of vision-language contrastive learning have made it possible for many downstream applications to be carried out efficiently and accurately by simply taking the dot product between image and text representations. One of the most representative approaches proposed recently known as CLIP has garnered widespread adoption due to its effectiveness. CLIP is trained with an InfoNCE loss that takes into account both positive and negative samples to help learn a much more robust representation space. This paper reveals that the common downstream practice of taking a dot product is only a zeroth-order approximation of the optimization goal, resulting in a loss of information during test-time. Intuitively, since the model has been optimized based on the InfoNCE loss, test-time procedures should also be in alignment. The question lies in how one can retrieve any semblance of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Cancer-related molecular mechanisms research

MethodsTest · Contrastive Language-Image Pre-training · Contrastive Learning · InfoNCE