TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning

Shiming Chen; Ziming Hong; Wenjin Hou; Guo-Sen Xie; Yibing Song; Jian; Zhao; Xinge You; Shuicheng Yan; and Ling Shao

arXiv:2112.08643·cs.CV·December 14, 2022·6 cites

TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning

Shiming Chen, Ziming Hong, Wenjin Hou, Guo-Sen Xie, Yibing Song, Jian, Zhao, Xinge You, Shuicheng Yan, and Ling Shao

PDF

Open Access 1 Repo

TL;DR

TransZero++ introduces a cross attribute-guided Transformer architecture for zero-shot learning, enhancing attribute localization and transferability of visual features, leading to state-of-the-art results on benchmark datasets.

Contribution

The paper proposes a novel cross attribute-guided Transformer network with dual sub-nets and collaborative learning for improved zero-shot recognition.

Findings

01

Achieves new state-of-the-art results on three ZSL benchmarks.

02

Effectively localizes attributes in images for better semantic-visual embedding.

03

Improves transferability of visual features across datasets.

Abstract

Zero-shot learning (ZSL) tackles the novel class recognition problem by transferring semantic knowledge from seen classes to unseen ones. Existing attention-based models have struggled to learn inferior region features in a single image by solely using unidirectional attention, which ignore the transferability and discriminative attribute localization of visual features. In this paper, we propose a cross attribute-guided Transformer network, termed TransZero++, to refine visual features and learn accurate attribute localization for semantic-augmented visual embedding representations in ZSL. TransZero++ consists of an attribute $\to$ visual Transformer sub-net (AVT) and a visual $\to$ attribute Transformer sub-net (VAT). Specifically, AVT first takes a feature augmentation encoder to alleviate the cross-dataset problem, and improves the transferability of visual features by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shiming-chen/transzero_pp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Residual Connection · Layer Normalization · Dropout · Label Smoothing · Byte Pair Encoding