TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning
Shiming Chen, Ziming Hong, Wenjin Hou, Guo-Sen Xie, Yibing Song, Jian, Zhao, Xinge You, Shuicheng Yan, and Ling Shao

TL;DR
TransZero++ introduces a cross attribute-guided Transformer architecture for zero-shot learning, enhancing attribute localization and transferability of visual features, leading to state-of-the-art results on benchmark datasets.
Contribution
The paper proposes a novel cross attribute-guided Transformer network with dual sub-nets and collaborative learning for improved zero-shot recognition.
Findings
Achieves new state-of-the-art results on three ZSL benchmarks.
Effectively localizes attributes in images for better semantic-visual embedding.
Improves transferability of visual features across datasets.
Abstract
Zero-shot learning (ZSL) tackles the novel class recognition problem by transferring semantic knowledge from seen classes to unseen ones. Existing attention-based models have struggled to learn inferior region features in a single image by solely using unidirectional attention, which ignore the transferability and discriminative attribute localization of visual features. In this paper, we propose a cross attribute-guided Transformer network, termed TransZero++, to refine visual features and learn accurate attribute localization for semantic-augmented visual embedding representations in ZSL. TransZero++ consists of an attributevisual Transformer sub-net (AVT) and a visualattribute Transformer sub-net (VAT). Specifically, AVT first takes a feature augmentation encoder to alleviate the cross-dataset problem, and improves the transferability of visual features by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Residual Connection · Layer Normalization · Dropout · Label Smoothing · Byte Pair Encoding
