An Investigation of Representation and Allocation Harms in Contrastive   Learning

Subha Maity; Mayank Agarwal; Mikhail Yurochkin; Yuekai Sun

arXiv:2310.01583·stat.ML·October 4, 2023

An Investigation of Representation and Allocation Harms in Contrastive Learning

Subha Maity, Mayank Agarwal, Mikhail Yurochkin, Yuekai Sun

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates how contrastive learning, a form of self-supervised learning, can cause representation harm by collapsing minority group representations, affecting downstream tasks, and provides a theoretical explanation for this phenomenon.

Contribution

It identifies and analyzes representation harm in contrastive learning, linking it to allocation harm and offering a theoretical model to explain the phenomenon.

Findings

01

Contrastive learning can collapse minority group representations.

02

Representation harm contributes to allocation harm in downstream tasks.

03

Theoretical model explains representation harm via neural collapse in contrastive learning.

Abstract

The effect of underrepresentation on the performance of minority groups is known to be a serious problem in supervised learning settings; however, it has been underexplored so far in the context of self-supervised learning (SSL). In this paper, we demonstrate that contrastive learning (CL), a popular variant of SSL, tends to collapse representations of minority groups with certain majority groups. We refer to this phenomenon as representation harm and demonstrate it on image and text datasets using the corresponding popular CL methods. Furthermore, our causal mediation analysis of allocation harm on a downstream classification task reveals that representation harm is partly responsible for it, thus emphasizing the importance of studying and mitigating representation harm. Finally, we provide a theoretical explanation for representation harm using a stochastic block model that leads to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

smaityumich/cl-representation-harm
pytorchOfficial

Videos

An Investigation of Representation and Allocation Harms in Contrastive Learning· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning

MethodsContrastive Learning