The Curious Layperson: Fine-Grained Image Recognition without Expert   Labels

Subhabrata Choudhury; Iro Laina; Christian Rupprecht; Andrea Vedaldi

arXiv:2111.03651·cs.CV·November 8, 2021

The Curious Layperson: Fine-Grained Image Recognition without Expert Labels

Subhabrata Choudhury, Iro Laina, Christian Rupprecht, Andrea Vedaldi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel approach for fine-grained image recognition that leverages web encyclopedia knowledge and non-expert image descriptions to match images with textual information without requiring expert annotations.

Contribution

It proposes a method to perform fine-grained recognition using web-based knowledge and non-expert descriptions, bypassing the need for expert-labeled data.

Findings

01

Effective in matching images with textual descriptions

02

Outperforms several strong baselines in cross-modal retrieval

03

Validated on two datasets with competitive results

Abstract

Most of us are not experts in specific fields, such as ornithology. Nonetheless, we do have general image and language understanding capabilities that we use to match what we see to expert resources. This allows us to expand our knowledge and perform novel tasks without ad-hoc external supervision. On the contrary, machines have a much harder time consulting expert-curated knowledge bases unless trained specifically with that knowledge in mind. Thus, in this paper we consider a new problem: fine-grained image recognition without expert annotations, which we address by leveraging the vast knowledge available in web encyclopedias. First, we learn a model to describe the visual appearance of objects using non-expert image descriptions. We then train a fine-grained textual similarity model that matches image descriptions with documents on a sentence-level basis. We evaluate the method on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

subhc/clever
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques