Learning Spatial Regularization with Image-level Supervisions for   Multi-label Image Classification

Feng Zhu; Hongsheng Li; Wanli Ouyang; Nenghai Yu; and Xiaogang Wang

arXiv:1702.05891·cs.CV·April 3, 2017·55 cites

Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification

Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, and Xiaogang Wang

PDF

Open Access 2 Repos

TL;DR

This paper introduces a deep neural network that leverages image-level supervision to model both semantic and spatial label relations, significantly enhancing multi-label image classification performance without requiring detailed spatial annotations.

Contribution

The proposed Spatial Regularization Network (SRN) uniquely captures spatial and semantic label relations using only image-level annotations, improving classification accuracy.

Findings

01

Outperforms state-of-the-art methods on three public datasets.

02

Effectively captures semantic and spatial label relations.

03

Improves classification performance with end-to-end training.

Abstract

Multi-label image classification is a fundamental but challenging task in computer vision. Great progress has been achieved by exploiting semantic relations between labels in recent years. However, conventional approaches are unable to model the underlying spatial relations between labels in multi-label images, because spatial annotations of the labels are generally not provided. In this paper, we propose a unified deep neural network that exploits both semantic and spatial relations between labels with only image-level supervisions. Given a multi-label image, our proposed Spatial Regularization Network (SRN) generates attention maps for all labels and captures the underlying relations between them via learnable convolutions. By aggregating the regularized classification results with original results by a ResNet-101 network, the classification performance can be consistently improved.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Text and Document Classification Technologies · Multimodal Machine Learning Applications