Exploiting Web Images for Dataset Construction: A Domain Robust Approach
Yazhou Yao, Jian Zhang, Fumin Shen, Xiansheng Hua, Jingsong Xu and, Zhenmin Tang

TL;DR
This paper introduces a novel framework for automatically constructing image datasets from web images that maintains domain robustness and reduces bias, using multi-instance learning and semantic expansion.
Contribution
The paper presents a new dataset construction method that improves domain adaptation by filtering irrelevant images through semantic expansion and multi-instance learning.
Findings
The constructed dataset shows strong domain robustness in classification tasks.
The method effectively filters noisy images, enhancing dataset quality.
Experiments demonstrate improved cross-dataset generalization.
Abstract
Labelled image datasets have played a critical role in high-level image understanding. However, the process of manual labelling is both time-consuming and labor intensive. To reduce the cost of manual labelling, there has been increased research interest in automatically constructing image datasets by exploiting web images. Datasets constructed by existing methods tend to have a weak domain adaptation ability, which is known as the "dataset bias problem". To address this issue, we present a novel image dataset construction framework that can be generalized well to unseen target domains. Specifically, the given queries are first expanded by searching the Google Books Ngrams Corpus to obtain a rich semantic description, from which the visually non-salient and less relevant expansions are filtered out. By treating each selected expansion as a "bag" and the retrieved images as "instances",…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
