Dissecting Supervised Contrastive Learning
Florian Graf, Christoph D. Hofer, Marc Niethammer, Roland Kwitt

TL;DR
This paper investigates the geometric properties of representations learned through supervised contrastive learning, showing they tend to form a regular simplex, and compares its optimization behavior to traditional cross-entropy training.
Contribution
It provides a theoretical analysis of the representation geometry in supervised contrastive learning and empirically demonstrates differences in optimization dynamics compared to cross-entropy.
Findings
Representations collapse to a regular simplex at minimal loss.
Close-to-optimal states correlate with good generalization.
Supervised contrastive loss requires superlinear iterations with label noise, unlike cross-entropy.
Abstract
Minimizing cross-entropy over the softmax scores of a linear map composed with a high-capacity encoder is arguably the most popular choice for training neural networks on supervised learning tasks. However, recent works show that one can directly optimize the encoder instead, to obtain equally (or even more) discriminative representations via a supervised variant of a contrastive objective. In this work, we address the question whether there are fundamental differences in the sought-for representation geometry in the output space of the encoder at minimal loss. Specifically, we prove, under mild assumptions, that both losses attain their minimum once the representations of each class collapse to the vertices of a regular simplex, inscribed in a hypersphere. We provide empirical evidence that this configuration is attained in practice and that reaching a close-to-optimal state typically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
MethodsSoftmax
