CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

Andreas F\"urst; Elisabeth Rumetshofer; Johannes Lehner; Viet Tran,; Fei Tang; Hubert Ramsauer; David Kreil; Michael Kopp; G\"unter Klambauer,; Angela Bitto-Nemling; Sepp Hochreiter

arXiv:2110.11316·cs.LG·November 8, 2022·30 cites

CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

Andreas F\"urst, Elisabeth Rumetshofer, Johannes Lehner, Viet Tran,, Fei Tang, Hubert Ramsauer, David Kreil, Michael Kopp, G\"unter Klambauer,, Angela Bitto-Nemling, Sepp Hochreiter

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

CLOOB introduces a novel approach combining modern Hopfield networks with the InfoLOOB objective to enhance covariance structure extraction, resulting in improved zero-shot transfer learning performance over CLIP.

Contribution

The paper proposes CLOOB, a new method that integrates modern Hopfield networks and the InfoLOOB objective to address explaining away in CLIP-like models.

Findings

01

CLOOB outperforms CLIP in zero-shot transfer tasks across multiple datasets.

02

Using Hopfield networks enriches the covariance structure in embeddings.

03

The InfoLOOB objective mitigates saturation effects, improving learning stability.

Abstract

CLIP yielded impressive results on zero-shot transfer learning tasks and is considered as a foundation model like BERT or GPT3. CLIP vision models that have a rich representation are pre-trained using the InfoNCE objective and natural language supervision before they are fine-tuned on particular tasks. Though CLIP excels at zero-shot transfer learning, it suffers from an explaining away problem, that is, it focuses on one or few features, while neglecting other relevant features. This problem is caused by insufficiently extracting the covariance structure in the original multi-modal data. We suggest to use modern Hopfield networks to tackle the problem of explaining away. Their retrieved embeddings have an enriched covariance structure derived from co-occurrences of features in the stored embeddings. However, modern Hopfield networks increase the saturation effect of the InfoNCE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ml-jku/cloob
pytorchOfficial

Models

🤗
rinna/japanese-cloob-vit-b-16
model· 1.8k dl· ♡ 13
1.8k dl♡ 13

Videos

CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Dropout · Adam · Attention Dropout · Linear Warmup With Linear Decay · WordPiece · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia?