Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition
Tianshui Chen, Muxin Xu, Xiaolu Hui, Hefeng Wu, Liang Lin

TL;DR
This paper introduces a novel framework called SSGRL that enhances multi-label image recognition by explicitly modeling semantic regions and their interactions through graph-based methods, leading to improved accuracy.
Contribution
The paper proposes a semantic decoupling and interaction framework that explicitly models label co-occurrence and semantic region interactions for better multi-label recognition.
Findings
Outperforms state-of-the-art methods on multiple benchmarks.
Achieves significant improvements in mean Average Precision (mAP).
Demonstrates effectiveness of graph-based semantic interaction modeling.
Abstract
Recognizing multiple labels of images is a practical and challenging task, and significant progress has been made by searching semantic-aware regions and modeling label dependency. However, current methods cannot locate the semantic regions accurately due to the lack of part-level supervision or semantic guidance. Moreover, they cannot fully explore the mutual interactions among the semantic regions and do not explicitly model the label co-occurrence. To address these issues, we propose a Semantic-Specific Graph Representation Learning (SSGRL) framework that consists of two crucial modules: 1) a semantic decoupling module that incorporates category semantics to guide learning semantic-specific representations and 2) a semantic interaction module that correlates these representations with a graph built on the statistical label co-occurrence and explores their interactions via a graph…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
