Generalizing Supervised Contrastive learning: A Projection Perspective
Minoh Jeong, Alfred Hero

TL;DR
This paper introduces ProjNCE, a generalized contrastive loss that unifies supervised and self-supervised learning, providing a theoretical MI bound and improving performance on image and audio tasks.
Contribution
It proposes ProjNCE, a novel MI-bounded contrastive loss with flexible projection strategies, and demonstrates its effectiveness over existing methods.
Findings
ProjNCE outperforms SupCon and cross-entropy in experiments.
Theoretical proof that ProjNCE bounds mutual information.
Flexible projection strategies improve class embedding quality.
Abstract
Self-supervised contrastive learning (SSCL) has emerged as a powerful paradigm for representation learning and has been studied from multiple perspectives, including mutual information and geometric viewpoints. However, supervised contrastive (SupCon) approaches have received comparatively little attention in this context: for instance, while InfoNCE used in SSCL is known to form a lower bound on mutual information (MI), the relationship between SupCon and MI remains unexplored. To address this gap, we introduce ProjNCE, a generalization of the InfoNCE loss that unifies supervised and self-supervised contrastive objectives by incorporating projection functions and an adjustment term for negative pairs. We prove that ProjNCE constitutes a valid MI bound and affords greater flexibility in selecting projection strategies for class embeddings. Building on this flexibility, we further…
Peer Reviews
Decision·Submitted to ICLR 2026
1. strong linear-probe results on both vision and audio datasets, outperforming CE and SupCon, with notable robustness to noisy labels. 2. the paper evaluates multiple projection strategies (Orthogonal, Median, MLP), showing consistent gains. 3. the derivation of ProjNCE as a proper MI lower bound has a good theoretical motivation.
1. the paper does not compare against other self-supervised methods such as BYOL, Barlow Twins, DINO, VICReg, or PaCo, which limits its positioning relative to the state of the art. 2. The paper does not include sensitivity studies on key hyperparameters such as $\beta$, the choice of kernel bandwidth, or embedding dimensionality. 3. I am not sure about correctness of the claim on line 154 that SupCon is mathematically equivalent to $I^{\text{self-p}}_{NCE}(Z;C)$ is inaccurate. Substituting $g
- The paper discusses how SupCon relates to mutual information, which is an unexplored and interesting topic. - The derivation of ProjNCE as a generalized form of InfoNCE is clearly presented and mathematically sound. - The idea of decoupling projection functions for positive and negative pairs is well motivated.
- It remains unclear whether the observed improvements stem from the proposed MI-based formulation itself or merely from the additional parameters introduced by the projection functions. - The practical benefit of maintaining a “valid MI bound” is not convincingly demonstrated—there is no evidence that tighter MI bounds correlate with better downstream performance or robustness. - The comparisons omit several recent strong baselines trained on larger-scale datasets, which weakens the empirical s
- proposition 2.1 is quite interesting result. Its derivation is simple, but it generalizes InfoNCE (so infoNCE bound is preserved) while points out that the adjustment term is missing in SupCon. - The adjustment term has intuitive meanings where they pull $g_-(c_k)$ to the $f(x)$, while push $g_+(c_k)$ from the $f(x)$. Which is implicitly performed in SupCon but without explicit guidance. - Strong empirical results. The method seems achieving significant improvement from previous methods.
- I think the concept of label representation is already presented in other papers. [1,2] Authors should include this concept and explain why ProjNCE is different from these methods. [1] Label Supervised Contrastive Learning for Imbalanced Text Classification in Euclidean and Hyperbolic Embedding Spaces, Khalid et al. [2] Supervised contrastive learning over prototype-label embeddings for network intrusion detection, Lopez-Martin et al.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Face recognition and analysis · Generative Adversarial Networks and Image Synthesis
MethodsSoftmax · Attention Is All You Need · InfoNCE · Contrastive Learning
