Multi-Head Self-Attention via Vision Transformer for Zero-Shot Learning

Faisal Alamri; Anjan Dutta

arXiv:2108.00045·cs.CV·August 3, 2021·23 cites

Multi-Head Self-Attention via Vision Transformer for Zero-Shot Learning

Faisal Alamri, Anjan Dutta

PDF

Open Access 2 Repos

TL;DR

This paper introduces an attention-based Vision Transformer model for Zero-Shot Learning that effectively captures discriminative attributes from images, achieving state-of-the-art results on multiple benchmarks.

Contribution

It proposes a novel attention mechanism adapted from Vision Transformer to explicitly localize and learn attributes for unseen class recognition in ZSL.

Findings

01

Achieved new state-of-the-art harmonic mean results on AWA2, CUB, and SUN datasets.

02

Demonstrated the effectiveness of attribute localization via attention in ZSL.

03

Validated the approach's superiority over existing methods.

Abstract

Zero-Shot Learning (ZSL) aims to recognise unseen object classes, which are not observed during the training phase. The existing body of works on ZSL mostly relies on pretrained visual features and lacks the explicit attribute localisation mechanism on images. In this work, we propose an attention-based model in the problem settings of ZSL to learn attributes useful for unseen class recognition. Our method uses an attention mechanism adapted from Vision Transformer to capture and learn discriminative attributes by splitting images into small patches. We conduct experiments on three popular ZSL benchmarks (i.e., AWA2, CUB and SUN) and set new state-of-the-art harmonic mean results {on all the three datasets}, which illustrate the effectiveness of our proposed method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Advanced Neural Network Applications

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Softmax · Adam · Layer Normalization · Dropout