Combining Metric Learning and Attention Heads For Accurate and Efficient Multilabel Image Classification
Kirill Prokofiev, Vladislav Sovrasov

TL;DR
This paper introduces a novel metric learning-based training strategy for multilabel image classification that achieves state-of-the-art results with reduced computational resources by effectively leveraging transformer heads and label relation graphs.
Contribution
The authors propose a metric learning modification of the standard loss and a training strategy that improves discrimination in multilabel classification, outperforming existing methods.
Findings
Achieves SOTA results on MS-COCO, PASCAL-VOC, NUS-Wide, and Visual Genome 500 datasets.
Reduces computational resources needed for inference compared to transformer-based approaches.
Demonstrates that graph-based methods with proper training can rival transformer-based heads in accuracy.
Abstract
Multi-label image classification allows predicting a set of labels from a given image. Unlike multiclass classification, where only one label per image is assigned, such a setup is applicable for a broader range of applications. In this work we revisit two popular approaches to multilabel classification: transformer-based heads and labels relations information graph processing branches. Although transformer-based heads are considered to achieve better results than graph-based branches, we argue that with the proper training strategy, graph-based methods can demonstrate just a small accuracy drop, while spending less computational resources on inference. In our training strategy, instead of Asymmetric Loss (ASL), which is the de-facto standard for multilabel classification, we introduce its metric learning modification. In each binary classification sub-problem it operates with …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Machine Learning in Bioinformatics · Advanced Image and Video Retrieval Techniques
