Zero-Shot Learning Through Cross-Modal Transfer

Richard Socher; Milind Ganjoo; Hamsa Sridhar; Osbert Bastani,; Christopher D. Manning; Andrew Y. Ng

arXiv:1301.3666·cs.CV·March 21, 2013·ICLR·807 cites

Zero-Shot Learning Through Cross-Modal Transfer

Richard Socher, Milind Ganjoo, Hamsa Sridhar, Osbert Bastani,, Christopher D. Manning, Andrew Y. Ng

PDF

Open Access 2 Repos

TL;DR

This paper presents a zero-shot learning model that leverages unsupervised text data to recognize unseen objects in images, achieving state-of-the-art results without manual semantic features.

Contribution

It introduces a novel cross-modal transfer approach that uses language-based semantic information for image recognition, capable of handling both seen and unseen classes.

Findings

01

Achieves state-of-the-art performance on classes with many training images.

02

Performs reasonably well on unseen classes without training data.

03

Does not require manually defined semantic features.

Abstract

This work introduces a model that can recognize objects in images even if no training data is available for the objects. The only necessary knowledge about the unseen categories comes from unsupervised large text corpora. In our zero-shot framework distributional information in language can be seen as spanning a semantic basis for understanding what objects look like. Most previous zero-shot learning models can only differentiate between unseen classes. In contrast, our model can both obtain state of the art performance on classes that have thousands of training images and obtain reasonable performance on unseen classes. This is achieved by first using outlier detection in the semantic space and then two separate recognition models. Furthermore, our model does not require any manually defined semantic features for either words or images.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Multimodal Machine Learning Applications