Query2Label: A Simple Transformer Way to Multi-Label Classification

Shilong Liu; Lei Zhang; Xiao Yang; Hang Su; Jun Zhu

arXiv:2107.10834·cs.CV·July 23, 2021·120 cites

Query2Label: A Simple Transformer Way to Multi-Label Classification

Shilong Liu, Lei Zhang, Xiao Yang, Hang Su, Jun Zhu

PDF

Open Access 3 Repos

TL;DR

This paper introduces Query2Label, a straightforward Transformer-based framework for multi-label classification that adaptively extracts local features for multiple objects, achieving state-of-the-art results on several datasets.

Contribution

It proposes a simple, effective Transformer decoder approach using label queries for multi-label classification, outperforming previous methods with a standard architecture.

Findings

01

Achieves 91.3% mAP on MS-COCO

02

Outperforms prior methods on five datasets

03

Uses standard Transformers and vision backbones

Abstract

This paper presents a simple and effective approach to solving the multi-label classification problem. The proposed approach leverages Transformer decoders to query the existence of a class label. The use of Transformer is rooted in the need of extracting local discriminative features adaptively for different labels, which is a strongly desired property due to the existence of multiple objects in one image. The built-in cross-attention module in the Transformer decoder offers an effective way to use label embeddings as queries to probe and pool class-related features from a feature map computed by a vision backbone for subsequent binary classifications. Compared with prior works, the new framework is simple, using standard Transformers and vision backbones, and effective, consistently outperforming all previous works on five multi-label classification data sets, including MS-COCO,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Text and Document Classification Technologies · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Concatenated Skip Connection · Softmax · Dense Connections · Adam