TL;DR
This paper introduces a novel region-based multi-label zero-shot learning method that preserves spatial details and uses a bi-level attention module to improve class discriminability, achieving state-of-the-art results.
Contribution
It proposes a discriminability-preserving approach with region-level features and bi-level attention, addressing limitations of shared attention maps in multi-label ZSL.
Findings
Achieves 6.9% higher mAP on NUS-WIDE for ZSL.
Sets new state-of-the-art on NUS-WIDE and Open Images benchmarks.
Improves discriminability and reduces feature entanglement in multi-label ZSL.
Abstract
Multi-label zero-shot learning (ZSL) is a more realistic counter-part of standard single-label ZSL since several objects can co-exist in a natural image. However, the occurrence of multiple objects complicates the reasoning and requires region-specific processing of visual features to preserve their contextual cues. We note that the best existing multi-label ZSL method takes a shared approach towards attending to region features with a common set of attention maps for all the classes. Such shared maps lead to diffused attention, which does not discriminatively focus on relevant locations when the number of classes are large. Moreover, mapping spatially-pooled visual features to the class semantics leads to inter-class feature entanglement, thus hampering the classification. Here, we propose an alternate approach towards region-based discriminability-preserving multi-label zero-shot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
