Dissecting Supervised Contrastive Learning

Florian Graf; Christoph D. Hofer; Marc Niethammer; Roland Kwitt

arXiv:2102.08817·stat.ML·March 3, 2023·1 cites

Dissecting Supervised Contrastive Learning

Florian Graf, Christoph D. Hofer, Marc Niethammer, Roland Kwitt

PDF

Open Access 1 Repo

TL;DR

This paper investigates the geometric properties of representations learned through supervised contrastive learning, showing they tend to form a regular simplex, and compares its optimization behavior to traditional cross-entropy training.

Contribution

It provides a theoretical analysis of the representation geometry in supervised contrastive learning and empirically demonstrates differences in optimization dynamics compared to cross-entropy.

Findings

01

Representations collapse to a regular simplex at minimal loss.

02

Close-to-optimal states correlate with good generalization.

03

Supervised contrastive loss requires superlinear iterations with label noise, unlike cross-entropy.

Abstract

Minimizing cross-entropy over the softmax scores of a linear map composed with a high-capacity encoder is arguably the most popular choice for training neural networks on supervised learning tasks. However, recent works show that one can directly optimize the encoder instead, to obtain equally (or even more) discriminative representations via a supervised variant of a contrastive objective. In this work, we address the question whether there are fundamental differences in the sought-for representation geometry in the output space of the encoder at minimal loss. Specifically, we prove, under mild assumptions, that both losses attain their minimum once the representations of each class collapse to the vertices of a regular simplex, inscribed in a hypersphere. We provide empirical evidence that this configuration is attained in practice and that reaching a close-to-optimal state typically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

plus-rkwitt/py_supcon_vs_ce
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis

MethodsSoftmax