Combining Metric Learning and Attention Heads For Accurate and Efficient   Multilabel Image Classification

Kirill Prokofiev; Vladislav Sovrasov

arXiv:2209.06585·cs.CV·December 21, 2022·1 cites

Combining Metric Learning and Attention Heads For Accurate and Efficient Multilabel Image Classification

Kirill Prokofiev, Vladislav Sovrasov

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel metric learning-based training strategy for multilabel image classification that achieves state-of-the-art results with reduced computational resources by effectively leveraging transformer heads and label relation graphs.

Contribution

The authors propose a metric learning modification of the standard loss and a training strategy that improves discrimination in multilabel classification, outperforming existing methods.

Findings

01

Achieves SOTA results on MS-COCO, PASCAL-VOC, NUS-Wide, and Visual Genome 500 datasets.

02

Reduces computational resources needed for inference compared to transformer-based approaches.

03

Demonstrates that graph-based methods with proper training can rival transformer-based heads in accuracy.

Abstract

Multi-label image classification allows predicting a set of labels from a given image. Unlike multiclass classification, where only one label per image is assigned, such a setup is applicable for a broader range of applications. In this work we revisit two popular approaches to multilabel classification: transformer-based heads and labels relations information graph processing branches. Although transformer-based heads are considered to achieve better results than graph-based branches, we argue that with the proper training strategy, graph-based methods can demonstrate just a small accuracy drop, while spending less computational resources on inference. In our training strategy, instead of Asymmetric Loss (ASL), which is the de-facto standard for multilabel classification, we introduce its metric learning modification. In each binary classification sub-problem it operates with $L_{2}$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

openvinotoolkit/deep-object-reid
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Machine Learning in Bioinformatics · Advanced Image and Video Retrieval Techniques