Pyramid Attention Network for Semantic Segmentation
Hanchao Li, Pengfei Xiong, Jie An, Lingxue Wang

TL;DR
This paper introduces Pyramid Attention Network (PAN), which combines attention mechanisms and spatial pyramids to improve semantic segmentation by effectively capturing global context and precise features, achieving state-of-the-art results.
Contribution
The paper presents a novel Pyramid Attention Network that integrates attention modules with spatial pyramids, avoiding complex dilated convolutions and designed decoders, to enhance semantic segmentation performance.
Findings
Achieved 84.0% mIoU on PASCAL VOC 2012 without COCO training.
Outperformed existing methods on PASCAL VOC 2012 and Cityscapes benchmarks.
Introduced Feature Pyramid Attention and Global Attention Upsample modules.
Abstract
A Pyramid Attention Network(PAN) is proposed to exploit the impact of global contextual information in semantic segmentation. Different from most existing works, we combine attention mechanism and spatial pyramid to extract precise dense features for pixel labeling instead of complicated dilated convolution and artificially designed decoder networks. Specifically, we introduce a Feature Pyramid Attention module to perform spatial pyramid attention structure on high-level output and combining global pooling to learn a better feature representation, and a Global Attention Upsample module on each decoder layer to provide global context as a guidance of low-level features to select category localization details. The proposed approach achieves state-of-the-art performance on PASCAL VOC 2012 and Cityscapes benchmarks with a new record of mIoU accuracy 84.0% on PASCAL VOC 2012, while training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
