WebVision Database: Visual Learning and Understanding from Web Data
Wen Li, Limin Wang, Wei Li, Eirikur Agustsson, Luc Van Gool

TL;DR
This paper introduces the WebVision database, a large-scale noisy web image dataset with meta information, demonstrating its effectiveness for training deep models and exploring domain adaptation in visual recognition.
Contribution
We created the WebVision database with over 2.4 million web images and meta data, enabling research on learning from noisy web data and domain adaptation.
Findings
Noisy web images suffice for training effective deep CNN models.
Models trained on WebVision generalize well to other datasets.
Dataset bias highlights the potential for visual domain adaptation research.
Abstract
In this paper, we present a study on learning visual recognition models from large scale noisy web data. We build a new database called WebVision, which contains more than million web images crawled from the Internet by using queries generated from the 1,000 semantic concepts of the benchmark ILSVRC 2012 dataset. Meta information along with those web images (e.g., title, description, tags, etc.) are also crawled. A validation set and test set containing human annotated images are also provided to facilitate algorithmic development. Based on our new database, we obtain a few interesting observations: 1) the noisy web images are sufficient for training a good deep CNN model for visual recognition; 2) the model learnt from our WebVision database exhibits comparable or even better generalization ability than the one trained from the ILSVRC 2012 dataset when being transferred to new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
