Learning Visual Representations via Language-Guided Sampling

Mohamed El Banani; Karan Desai; Justin Johnson

arXiv:2302.12248·cs.CV·March 30, 2023

Learning Visual Representations via Language-Guided Sampling

Mohamed El Banani, Karan Desai, Justin Johnson

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel contrastive learning method that uses language similarity to sample semantically similar image pairs, leveraging pre-trained language models to improve visual representation learning.

Contribution

The paper proposes a new language-guided sampling approach for contrastive learning that outperforms traditional image-based and image-text methods.

Findings

01

Language-guided sampling yields better features than image-based contrastive learning.

02

Pre-trained language models effectively guide the sampling process.

03

The approach improves visual representation quality across experiments.

Abstract

Although an object may appear in numerous contexts, we often describe it in a limited number of ways. Language allows us to abstract away visual variation to represent and communicate concepts. Building on this intuition, we propose an alternative approach to visual representation learning: using language similarity to sample semantically similar image pairs for contrastive learning. Our approach diverges from image-based contrastive learning by sampling view pairs using language similarity instead of hand-crafted augmentations or learned clusters. Our approach also differs from image-text contrastive learning by relying on pre-trained language models to guide the learning rather than directly minimizing a cross-modal loss. Through a series of experiments, we show that language-guided learning yields better features than image-based and image-text representation learning approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mbanani/lgssl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsContrastive Learning