Webly Supervised Learning of Convolutional Networks
Xinlei Chen, Abhinav Gupta

TL;DR
This paper introduces a webly supervised learning approach for CNNs that leverages large-scale web data through a two-stage training process, outperforming traditional methods and demonstrating robustness to noisy data.
Contribution
The paper proposes a novel two-step curriculum-inspired training method for CNNs using web data, improving performance without relying on extensive labeled datasets.
Findings
Outperforms fine-tuned ImageNet CNN on Pascal VOC 2012
Achieves state-of-the-art results on VOC 2007 without using VOC training data
Robust to noisy web data, performing well with older image search results
Abstract
We present an approach to utilize large amounts of web data for learning CNNs. Specifically inspired by curriculum learning, we present a two-step approach for CNN training. First, we use easy images to train an initial visual representation. We then use this initial CNN and adapt it to harder, more realistic images by leveraging the structure of data and categories. We demonstrate that our two-stage CNN outperforms a fine-tuned CNN trained on ImageNet on Pascal VOC 2012. We also demonstrate the strength of webly supervised learning by localizing objects in web images and training a R-CNN style detector. It achieves the best performance on VOC 2007 where no VOC training data is used. Finally, we show our approach is quite robust to noise and performs comparably even when we use image search results from March 2013 (pre-CNN image search era).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Human Pose and Action Recognition · Advanced Neural Network Applications
