Design of the topology for contrastive visual-textual alignment

Zhun Sun

arXiv:2209.02127·cs.CV·October 10, 2023

Design of the topology for contrastive visual-textual alignment

Zhun Sun

PDF

Open Access 1 Repo

TL;DR

This paper investigates the role of the softmax temperature in contrastive visual-textual alignment and proposes a new topology using an oblique manifold to improve zero-shot classification performance.

Contribution

It introduces a novel topology for embedding alignment using an oblique manifold and demonstrates its effectiveness in enhancing zero-shot classification accuracy.

Findings

01

Improved zero-shot classification performance by an average of 6.1%.

02

Highlights the softmax temperature as a key factor in contrastive learning on noisy data.

03

Proposes a topology that better captures the embedding space structure for contrastive tasks.

Abstract

Cosine similarity is the common choice for measuring the distance between the feature representations in contrastive visual-textual alignment learning. However, empirically a learnable softmax temperature parameter is required when learning on large-scale noisy training data. In this work, we first discuss the role of softmax temperature from the embedding space's topological properties. We argue that the softmax temperature is the key mechanism for contrastive learning on noisy training data. It acts as a scaling factor of the distance range (e.g. [-1, 1] for the cosine similarity), and its learned value indicates the level of noise in the training data. Then, we propose an alternative design of the topology for the embedding alignment. We make use of multiple class tokens in the transformer architecture; then map the feature representations onto an oblique manifold endowed with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

minogame/clip-mtob
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

MethodsSoftmax