How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction

Jun Chen; Hong Chen; Yonghua Yu; Yiming Ying

arXiv:2507.11161·stat.ML·July 18, 2025

How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction

Jun Chen, Hong Chen, Yonghua Yu, Yiming Ying

PDF

Open Access 1 Video

TL;DR

This paper examines how labeling errors affect contrastive learning's effectiveness, revealing negative impacts, proposing SVD-based mitigation, and providing practical augmentation strategies to balance data reduction and connectivity.

Contribution

It offers a theoretical and empirical analysis of labeling error impacts and introduces data dimensionality reduction techniques to mitigate these effects in contrastive learning.

Findings

01

Labeling errors significantly increase downstream classification risk.

02

SVD can reduce false positives but may also impair data connectivity.

03

Moderate embedding dimensions and data augmentation improve performance.

Abstract

In recent years, contrastive learning has achieved state-of-the-art performance in the territory of self-supervised representation learning. Many previous works have attempted to provide the theoretical understanding underlying the success of contrastive learning. Almost all of them rely on a default assumption, i.e., the label consistency assumption, which may not hold in practice (the probability of failure is called labeling error) due to the strength and randomness of common augmentation strategies, such as random resized crop (RRC). This paper investigates the theoretical impact of labeling error on the downstream classification performance of contrastive learning. We first reveal several significant negative impacts of labeling error on downstream classification risk. To mitigate these impacts, data dimensionality reduction method (e.g., singular value decomposition, SVD) is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction· slideslive

Taxonomy

TopicsMachine Learning and Data Classification · Statistics Education and Methodologies