Attention to Refine through Multi-Scales for Semantic Segmentation

Shiqi Yang; Gang Peng

arXiv:1807.02917·cs.CV·July 10, 2018

Attention to Refine through Multi-Scales for Semantic Segmentation

Shiqi Yang, Gang Peng

PDF

Open Access

TL;DR

This paper introduces a multi-scale attention model for semantic segmentation that effectively combines features from different scales to improve prediction accuracy, demonstrating competitive results on standard benchmarks.

Contribution

The paper presents a novel multi-scale attention mechanism with a recalibration branch, enhancing feature integration for semantic segmentation tasks.

Findings

01

Achieves state-of-the-art performance on PASCAL VOC 2012.

02

Surpasses baseline and related methods on ADE20K.

03

Demonstrates effective multi-scale feature aggregation.

Abstract

This paper proposes a novel attention model for semantic segmentation, which aggregates multi-scale and context features to refine prediction. Specifically, the skeleton convolutional neural network framework takes in multiple different scales inputs, by which means the CNN can get representations in different scales. The proposed attention model will handle the features from different scale streams respectively and integrate them. Then location attention branch of the model learns to softly weight the multi-scale features at each pixel location. Moreover, we add an recalibrating branch, parallel to where location attention comes out, to recalibrate the score map per class. We achieve quite competitive results on PASCAL VOC 2012 and ADE20K datasets, which surpass baseline and related works.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning