Impact of Noisy Supervision in Foundation Model Learning
Hao Chen, Zihan Wang, Ran Tao, Hongxin Wei, Xing Xie, Masashi, Sugiyama, Bhiksha Raj, Jindong Wang

TL;DR
This paper investigates the impact of label noise in large-scale pre-training datasets for foundation models, revealing that slight noise can help in-domain performance but harms out-of-domain generalization, and proposes a mitigation method called NMTune.
Contribution
It provides a comprehensive analysis of noise in pre-training datasets, demonstrating its effects on model generalization, and introduces NMTune to mitigate noise impacts across various models and tasks.
Findings
Slight noise benefits in-domain performance.
Noise deteriorates out-of-domain generalization.
NMTune effectively mitigates noise effects.
Abstract
Foundation models are usually pre-trained on large-scale datasets and then adapted to downstream tasks through tuning. However, the large-scale pre-training datasets, often inaccessible or too expensive to handle, can contain label noise that may adversely affect the generalization of the model and pose unexpected risks. This paper stands out as the first work to comprehensively understand and analyze the nature of noise in pre-training datasets and then effectively mitigate its impacts on downstream tasks. Specifically, through extensive experiments of fully-supervised and image-text contrastive pre-training on synthetic noisy ImageNet-1K, YFCC15M, and CC12M datasets, we demonstrate that, while slight noise in pre-training can benefit in-domain (ID) performance, where the training and testing data share a similar distribution, it always deteriorates out-of-domain (OOD) performance,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
