$f$-MICL: Understanding and Generalizing InfoNCE-based Contrastive Learning
Yiwei Lu,Guojun Zhang,Sun Sun,Hongyu Guo,Yaoliang Yu

TL;DR
This paper introduces $f$-MICL, a generalized framework for contrastive learning that extends InfoNCE by using $f$-divergences, proposes a new similarity measure, and demonstrates improved empirical performance across vision and language tasks.
Contribution
The paper develops a generalized $f$-MICL framework that extends InfoNCE with $f$-divergences and introduces an $f$-Gaussian similarity for better interpretability and performance.
Findings
$f$-MICL often outperforms traditional InfoNCE-based methods.
The $f$-Gaussian similarity improves interpretability and empirical results.
Performance varies with different $f$-divergences depending on task and dataset.
Abstract
In self-supervised contrastive learning, a widely-adopted objective function is InfoNCE, which uses the heuristic cosine similarity for the representation comparison, and is closely related to maximizing the Kullback-Leibler (KL)-based mutual information. In this paper, we aim at answering two intriguing questions: (1) Can we go beyond the KL-based objective? (2) Besides the popular cosine similarity, can we design a better similarity function? We provide answers to both questions by generalizing the KL-based mutual information to the -Mutual Information in Contrastive Learning (-MICL) using the -divergences. To answer the first question, we provide a wide range of -MICL objectives which share the nice properties of InfoNCE (e.g., alignment and uniformity), and meanwhile result in similar or even superior performance. For the second question, assuming that the joint feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFuzzy Logic and Control Systems
MethodsBatch Normalization · Momentum Contrast · InfoNCE · Contrastive Learning
