Visual Transformers with Primal Object Queries for Multi-Label Image   Classification

Vacit Oguz Yazici; Joost van de Weijer; Longlong Yu

arXiv:2112.05485·cs.CV·May 17, 2022

Visual Transformers with Primal Object Queries for Multi-Label Image Classification

Vacit Oguz Yazici, Joost van de Weijer, Longlong Yu

PDF

Open Access 1 Repo

TL;DR

This paper introduces primal object queries in vision transformers for multi-label image classification, enhancing performance and convergence speed over previous methods.

Contribution

It proposes a novel use of primal object queries only at the start of the transformer decoder, improving training efficiency and accuracy.

Findings

01

Improves class-wise F1 score by 2.1% on MS-COCO

02

Speeds up convergence by 79% on MS-COCO

03

Achieves state-of-the-art results on NUS-WIDE

Abstract

Multi-label image classification is about predicting a set of class labels that can be considered as orderless sequential data. Transformers process the sequential data as a whole, therefore they are inherently good at set prediction. The first vision-based transformer model, which was proposed for the object detection task introduced the concept of object queries. Object queries are learnable positional encodings that are used by attention modules in decoder layers to decode the object classes or bounding boxes using the region of interests in an image. However, inputting the same set of object queries to different decoder layers hinders the training: it results in lower performance and delays convergence. In this paper, we propose the usage of primal object queries that are only provided at the start of the transformer decoder stack. In addition, we improve the mixup technique…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

voyazici/visual-transformers-classification
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Image Retrieval and Classification Techniques

MethodsMixup