Link the head to the "beak": Zero Shot Learning from Noisy Text   Description at Part Precision

Mohamed Elhoseiny; Yizhe Zhu; Han Zhang; Ahmed Elgammal

arXiv:1709.01148·cs.CV·September 6, 2017

Link the head to the "beak": Zero Shot Learning from Noisy Text Description at Part Precision

Mohamed Elhoseiny, Yizhe Zhu, Han Zhang, Ahmed Elgammal

PDF

TL;DR

This paper introduces a zero-shot learning framework that links unstructured text descriptions to specific visual parts of birds, enabling classification without training images and outperforming previous methods.

Contribution

The proposed method connects text terms to visual parts without part-text annotations, improving zero-shot recognition accuracy on bird datasets.

Findings

01

Achieved 43.6% accuracy on CUBirds 2011, surpassing previous 34.7%.

02

Outperformed existing methods on large-scale bird image benchmarks.

03

Enabled part-specific visual classification from unstructured text descriptions.

Abstract

In this paper, we study learning visual classifiers from unstructured text descriptions at part precision with no training images. We propose a learning framework that is able to connect text terms to its relevant parts and suppress connections to non-visual text terms without any part-text annotations. For instance, this learning process enables terms like "beak" to be sparsely linked to the visual representation of parts like head, while reduces the effect of non-visual terms like "migrate" on classifier prediction. Images are encoded by a part-based CNN that detect bird parts and learn part-specific representation. Part-based visual classifiers are predicted from text descriptions of unseen visual classifiers to facilitate classification without training images (also known as zero-shot recognition). We performed our experiments on CUBirds 2011 dataset and improves the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.