TL;DR
This paper introduces a novel end-to-end deep Multiple Instance Learning framework for multi-label zero-shot image tagging, capable of handling multiple unseen labels without offline procedures and producing bounding boxes from weak annotations.
Contribution
It presents the first deep MIL model for multi-label zero-shot tagging that is fully trainable and does not rely on offline bag generation methods.
Findings
Achieved superior performance on NUS-WIDE dataset
Effectively handles multiple unseen labels during testing
Can generate bounding boxes with weak annotations
Abstract
In-line with the success of deep learning on traditional recognition problem, several end-to-end deep models for zero-shot recognition have been proposed in the literature. These models are successful to predict a single unseen label given an input image, but does not scale to cases where multiple unseen objects are present. In this paper, we model this problem within the framework of Multiple Instance Learning (MIL). To the best of our knowledge, we propose the first end-to-end trainable deep MIL framework for the multi-label zero-shot tagging problem. Due to its novel design, the proposed framework has several interesting features: (1) Unlike previous deep MIL models, it does not use any off-line procedure (e.g., Selective Search or EdgeBoxes) for bag generation. (2) During test time, it can process any number of unseen labels given their semantic embedding vectors. (3) Using only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSelective Search
