Ditch the Denoiser: Emergence of Noise Robustness in Self-Supervised Learning from Data Curriculum
Wenquan Lu, Jiaqi Zhang, Hugues Van Assel, Randall Balestriero

TL;DR
This paper introduces a noise-robust self-supervised learning framework that trains on a curriculum of denoised and noisy data, enabling models to learn noise resilience without needing a denoiser during deployment.
Contribution
It proposes a novel self-supervised training method that internalizes noise robustness through a denoised-to-noisy curriculum and teacher-guided regularization, eliminating the need for a denoiser at inference.
Findings
Improves linear probing accuracy by 4.8% on noisy ImageNet-1k.
Enables noise robustness without denoiser during inference.
Demonstrates effectiveness under extreme Gaussian noise conditions.
Abstract
Self-Supervised Learning (SSL) has become a powerful solution to extract rich representations from unlabeled data. Yet, SSL research is mostly focused on clean, curated and high-quality datasets. As a result, applying SSL on noisy data remains a challenge, despite being crucial to applications such as astrophysics, medical imaging, geophysics or finance. In this work, we present a fully self-supervised framework that enables noise-robust representation learning without requiring a denoiser at inference or downstream fine-tuning. Our method first trains an SSL denoiser on noisy data, then uses it to construct a denoised-to-noisy data curriculum (i.e., training first on denoised, then noisy samples) for pretraining a SSL backbone (e.g., DINOv2), combined with a teacher-guided regularization that anchors noisy embeddings to their denoised counterparts. This process encourages the model to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and Data Classification
