A Refined Generalization Analysis for Extreme Multi-class Supervised Contrastive Representation Learning
Nong Minh Hieu, Antoine Ledent

TL;DR
This paper advances the theoretical understanding of contrastive learning in extreme multi-class settings by providing sharper sample complexity bounds that are independent of class distribution and tail class effects.
Contribution
It improves existing bounds to be proportional to the number of classes and introduces a new estimator for better risk concentration analysis across classes.
Findings
Sample complexity scales with the number of classes R.
New estimator captures risk concentration across classes.
Bounds are sharper in long-tailed class distributions.
Abstract
Contrastive Representation Learning (CRL) has achieved strong empirical success in multiple machine learning disciplines, yet its theoretical sample complexity remains poorly understood. Existing analyses usually assume that input tuples are identically and independently distributed, an assumption violated in most practical settings where contrastive tuples are constructed from a finite pool of labeled data, inducing dependencies among tuples. While one recent work analyzed this learning setting using U-Statistics to estimate the population risk, the techniques used therein require the risk of each class to concentrate uniformly, making excess risk bounds scale in the order of where denotes the probability of the rarest class. Such a dependency can be overly pessimistic in the extreme multiclass settings where there are many tail classes which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
