Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot Learning
Man Liu, Feng Li, Chunjie Zhang, Yunchao Wei, Huihui Bai, Yao Zhao

TL;DR
This paper introduces a progressive semantic-visual mutual adaptation network using dual transformer modules to improve semantic disambiguation and knowledge transfer in generalized zero-shot learning, addressing semantic ambiguity and bias issues.
Contribution
The paper proposes a novel PSVMA network with DSVTM modules for better semantic-visual interaction modeling and bias mitigation in GZSL, advancing the state-of-the-art performance.
Findings
Outperforms existing GZSL methods on benchmark datasets.
Effectively reduces semantic ambiguity in attribute-based recognition.
Improves generalization to unseen categories.
Abstract
Generalized Zero-Shot Learning (GZSL) identifies unseen categories by knowledge transferred from the seen domain, relying on the intrinsic interactions between visual and semantic information. Prior works mainly localize regions corresponding to the sharing attributes. When various visual appearances correspond to the same attribute, the sharing attributes inevitably introduce semantic ambiguity, hampering the exploration of accurate semantic-visual interactions. In this paper, we deploy the dual semantic-visual transformer module (DSVTM) to progressively model the correspondences between attribute prototypes and visual features, constituting a progressive semantic-visual mutual adaption (PSVMA) network for semantic disambiguation and knowledge transferability improvement. Specifically, DSVTM devises an instance-motivated semantic encoder that learns instance-centric prototypes to adapt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Mycobacterium research and diagnosis
