Transformer-based Dual Relation Graph for Multi-label Image Recognition
Jiawei Zhao, Ke Yan, Yifan Zhao, Xiaowei Guo, Feiyue Huang, Jia Li

TL;DR
This paper introduces a Transformer-based Dual Relation Graph framework for multi-label image recognition, leveraging structural and semantic relations to improve recognition accuracy and achieve state-of-the-art results.
Contribution
The paper proposes a novel dual relation learning framework combining structural and semantic graphs with Transformer architecture for enhanced multi-label recognition.
Findings
Achieved state-of-the-art performance on MS-COCO and VOC 2007 datasets.
Effectively models long-range object correlations and semantic meanings.
Improved robustness through joint relation graph learning.
Abstract
The simultaneous recognition of multiple objects in one image remains a challenging task, spanning multiple events in the recognition field such as various object scales, inconsistent appearances, and confused inter-class relationships. Recent research efforts mainly resort to the statistic label co-occurrences and linguistic word embedding to enhance the unclear semantics. Different from these researches, in this paper, we propose a novel Transformer-based Dual Relation learning framework, constructing complementary relationships by exploring two aspects of correlation, i.e., structural relation graph and semantic relation graph. The structural relation graph aims to capture long-range correlations from object context, by developing a cross-scale transformer-based architecture. The semantic graph dynamically models the semantic meanings of image objects with explicit semantic-aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
