An Investigation of Representation and Allocation Harms in Contrastive Learning
Subha Maity, Mayank Agarwal, Mikhail Yurochkin, Yuekai Sun

TL;DR
This paper investigates how contrastive learning, a form of self-supervised learning, can cause representation harm by collapsing minority group representations, affecting downstream tasks, and provides a theoretical explanation for this phenomenon.
Contribution
It identifies and analyzes representation harm in contrastive learning, linking it to allocation harm and offering a theoretical model to explain the phenomenon.
Findings
Contrastive learning can collapse minority group representations.
Representation harm contributes to allocation harm in downstream tasks.
Theoretical model explains representation harm via neural collapse in contrastive learning.
Abstract
The effect of underrepresentation on the performance of minority groups is known to be a serious problem in supervised learning settings; however, it has been underexplored so far in the context of self-supervised learning (SSL). In this paper, we demonstrate that contrastive learning (CL), a popular variant of SSL, tends to collapse representations of minority groups with certain majority groups. We refer to this phenomenon as representation harm and demonstrate it on image and text datasets using the corresponding popular CL methods. Furthermore, our causal mediation analysis of allocation harm on a downstream classification task reveals that representation harm is partly responsible for it, thus emphasizing the importance of studying and mitigating representation harm. Finally, we provide a theoretical explanation for representation harm using a stochastic block model that leads to a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
MethodsContrastive Learning
