Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification
Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, and Xiaogang Wang

TL;DR
This paper introduces a deep neural network that leverages image-level supervision to model both semantic and spatial label relations, significantly enhancing multi-label image classification performance without requiring detailed spatial annotations.
Contribution
The proposed Spatial Regularization Network (SRN) uniquely captures spatial and semantic label relations using only image-level annotations, improving classification accuracy.
Findings
Outperforms state-of-the-art methods on three public datasets.
Effectively captures semantic and spatial label relations.
Improves classification performance with end-to-end training.
Abstract
Multi-label image classification is a fundamental but challenging task in computer vision. Great progress has been achieved by exploiting semantic relations between labels in recent years. However, conventional approaches are unable to model the underlying spatial relations between labels in multi-label images, because spatial annotations of the labels are generally not provided. In this paper, we propose a unified deep neural network that exploits both semantic and spatial relations between labels with only image-level supervisions. Given a multi-label image, our proposed Spatial Regularization Network (SRN) generates attention maps for all labels and captures the underlying relations between them via learnable convolutions. By aggregating the regularized classification results with original results by a ResNet-101 network, the classification performance can be consistently improved.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Text and Document Classification Technologies · Multimodal Machine Learning Applications
