Optimal Representations for Generalized Contrastive Learning with Imbalanced Datasets
Thuan Nguyen, Shuchin Aeron, D. Richard Brown III, Prakash Ishwar

TL;DR
This paper characterizes the geometry of optimal representations in contrastive learning with imbalanced datasets, revealing phenomena like Neural Collapse and Minority Collapse depending on class proportions.
Contribution
It provides a theoretical framework for understanding optimal contrastive representations under class imbalance, including new phenomena and geometric characterizations.
Findings
Optimal representations exhibit class mean collapse with angular symmetry.
Class imbalance can cause Minority Collapse, where minority class samples collapse into a single vector.
The geometry of optimal representations can be determined by solving a convex optimization problem.
Abstract
In this paper, we provide a computable characterization of the geometry of optimal representations in Contrastive Learning (CL) when the classes are imbalanced. When classes are balanced and the representation dimension is greater than the number of classes, it is well-known that the optimal representations exhibit Neural Collapse (NC), i.e., representations from the same class collapse to their class means and the class means form an Equiangular Tight Frame (ETF). For imbalanced classes and a large, generalized family of CL losses, we prove that the optimal representations of all samples from the same class collapse to their class means and their geometry exhibits an angular symmetry structure that is determined by the relative class proportions. In general, we show that the geometry can be determined by solving a convex optimization problem. Exploiting this symmetry structure, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
