TL;DR
This paper demonstrates that zero-shot fine-grained image classification can be effectively achieved using image and class embeddings, with unsupervised embeddings from text outperforming supervised methods on challenging datasets.
Contribution
It introduces a compatibility learning approach for zero-shot classification that leverages both supervised and unsupervised output embeddings, achieving state-of-the-art results.
Findings
Unsupervised text-derived embeddings outperform supervised embeddings.
Combining multiple embeddings improves classification accuracy.
Achieved new state-of-the-art on Animals with Attributes and Caltech-UCSD Birds datasets.
Abstract
Image classification has advanced significantly in recent years with the availability of large-scale image sets. However, fine-grained classification remains a major challenge due to the annotation cost of large numbers of fine-grained categories. This project shows that compelling classification performance can be achieved on such categories even without labeled training data. Given image and class embeddings, we learn a compatibility function such that matching embeddings are assigned a higher score than mismatching ones; zero-shot classification of an image proceeds by finding the label yielding the highest joint compatibility score. We use state-of-the-art image features and focus on different supervised attributes and unsupervised output embeddings either derived from hierarchies or learned from unlabeled text corpora. We establish a substantially improved state-of-the-art on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
