The Curious Layperson: Fine-Grained Image Recognition without Expert Labels
Subhabrata Choudhury, Iro Laina, Christian Rupprecht, Andrea Vedaldi

TL;DR
This paper introduces a novel approach for fine-grained image recognition that leverages web encyclopedia knowledge and non-expert image descriptions to match images with textual information without requiring expert annotations.
Contribution
It proposes a method to perform fine-grained recognition using web-based knowledge and non-expert descriptions, bypassing the need for expert-labeled data.
Findings
Effective in matching images with textual descriptions
Outperforms several strong baselines in cross-modal retrieval
Validated on two datasets with competitive results
Abstract
Most of us are not experts in specific fields, such as ornithology. Nonetheless, we do have general image and language understanding capabilities that we use to match what we see to expert resources. This allows us to expand our knowledge and perform novel tasks without ad-hoc external supervision. On the contrary, machines have a much harder time consulting expert-curated knowledge bases unless trained specifically with that knowledge in mind. Thus, in this paper we consider a new problem: fine-grained image recognition without expert annotations, which we address by leveraging the vast knowledge available in web encyclopedias. First, we learn a model to describe the visual appearance of objects using non-expert image descriptions. We then train a fine-grained textual similarity model that matches image descriptions with documents on a sentence-level basis. We evaluate the method on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
