Multi-Instance Visual-Semantic Embedding
Zhou Ren, Hailin Jin, Zhe Lin, Chen Fang, Alan Yuille

TL;DR
This paper introduces a Multi-Instance visual-semantic Embedding model that effectively maps image subregions to labels, improving multi-label annotation and zero-shot learning performance.
Contribution
The paper presents a novel MIE model that handles multi-label images by discovering and mapping meaningful subregions to labels, advancing visual-semantic embedding techniques.
Findings
Outperforms state-of-the-art on multi-label image annotation
Achieves superior results in zero-shot learning
Effectively models complex image-label relationships
Abstract
Visual-semantic embedding models have been recently proposed and shown to be effective for image classification and zero-shot learning, by mapping images into a continuous semantic label space. Although several approaches have been proposed for single-label embedding tasks, handling images with multiple labels (which is a more general setting) still remains an open problem, mainly due to the complex underlying corresponding relationship between image and its labels. In this work, we present Multi-Instance visual-semantic Embedding model (MIE) for embedding images associated with either single or multiple labels. Our model discovers and maps semantically-meaningful image subregions to their corresponding labels. And we demonstrate the superiority of our method over the state-of-the-art on two tasks, including multi-label image annotation and zero-shot learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
