Multi-Instance Visual-Semantic Embedding

Zhou Ren; Hailin Jin; Zhe Lin; Chen Fang; Alan Yuille

arXiv:1512.06963·cs.CV·December 23, 2015·24 cites

Multi-Instance Visual-Semantic Embedding

Zhou Ren, Hailin Jin, Zhe Lin, Chen Fang, Alan Yuille

PDF

Open Access

TL;DR

This paper introduces a Multi-Instance visual-semantic Embedding model that effectively maps image subregions to labels, improving multi-label annotation and zero-shot learning performance.

Contribution

The paper presents a novel MIE model that handles multi-label images by discovering and mapping meaningful subregions to labels, advancing visual-semantic embedding techniques.

Findings

01

Outperforms state-of-the-art on multi-label image annotation

02

Achieves superior results in zero-shot learning

03

Effectively models complex image-label relationships

Abstract

Visual-semantic embedding models have been recently proposed and shown to be effective for image classification and zero-shot learning, by mapping images into a continuous semantic label space. Although several approaches have been proposed for single-label embedding tasks, handling images with multiple labels (which is a more general setting) still remains an open problem, mainly due to the complex underlying corresponding relationship between image and its labels. In this work, we present Multi-Instance visual-semantic Embedding model (MIE) for embedding images associated with either single or multiple labels. Our model discovers and maps semantically-meaningful image subregions to their corresponding labels. And we demonstrate the superiority of our method over the state-of-the-art on two tasks, including multi-label image annotation and zero-shot learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques