Learning From Noisy Large-Scale Datasets With Minimal Supervision
Andreas Veit, Neil Alldrin, Gal Chechik, Ivan Krasin, Abhinav Gupta,, Serge Belongie

TL;DR
This paper introduces a multi-task learning approach that leverages noisy large-scale datasets and a small clean subset to improve image classification, effectively reducing noise and enhancing performance across diverse classes.
Contribution
The authors propose a novel method that jointly cleans noisy annotations and trains classifiers, outperforming traditional fine-tuning methods on large, noisy datasets.
Findings
Outperforms direct fine-tuning across all categories.
Effective for classes with high annotation noise (20-80%).
Demonstrates significant improvements on the Open Images dataset.
Abstract
We present an approach to effectively use millions of images with noisy annotations in conjunction with a small subset of cleanly-annotated images to learn powerful image representations. One common approach to combine clean and noisy data is to first pre-train a network using the large noisy dataset and then fine-tune with the clean dataset. We show this approach does not fully leverage the information contained in the clean set. Thus, we demonstrate how to use the clean annotations to reduce the noise in the large dataset before fine-tuning the network using both the clean set and the full set with reduced noise. The approach comprises a multi-task network that jointly learns to clean noisy annotations and to accurately classify images. We evaluate our approach on the recently released Open Images dataset, containing ~9 million images, multiple annotations per image and over 6000…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · COVID-19 diagnosis using AI
