How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction
Jun Chen, Hong Chen, Yonghua Yu, Yiming Ying

TL;DR
This paper examines how labeling errors affect contrastive learning's effectiveness, revealing negative impacts, proposing SVD-based mitigation, and providing practical augmentation strategies to balance data reduction and connectivity.
Contribution
It offers a theoretical and empirical analysis of labeling error impacts and introduces data dimensionality reduction techniques to mitigate these effects in contrastive learning.
Findings
Labeling errors significantly increase downstream classification risk.
SVD can reduce false positives but may also impair data connectivity.
Moderate embedding dimensions and data augmentation improve performance.
Abstract
In recent years, contrastive learning has achieved state-of-the-art performance in the territory of self-supervised representation learning. Many previous works have attempted to provide the theoretical understanding underlying the success of contrastive learning. Almost all of them rely on a default assumption, i.e., the label consistency assumption, which may not hold in practice (the probability of failure is called labeling error) due to the strength and randomness of common augmentation strategies, such as random resized crop (RRC). This paper investigates the theoretical impact of labeling error on the downstream classification performance of contrastive learning. We first reveal several significant negative impacts of labeling error on downstream classification risk. To mitigate these impacts, data dimensionality reduction method (e.g., singular value decomposition, SVD) is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Statistics Education and Methodologies
