Multi-scale Attention U-Net (MsAUNet): A Modified U-Net Architecture for Scene Segmentation
Soham Chattopadhyay, Hritam Basak

TL;DR
This paper introduces a multi-scale attention U-Net (MsAUNet) that enhances scene segmentation by integrating attention gates and a compound loss function, leading to improved accuracy and faster convergence on standard datasets.
Contribution
The paper presents a novel multi-scale attention mechanism within a modified U-Net architecture and a combined loss function for better scene segmentation performance.
Findings
Achieved 79.88% mean IoU on PascalVOC2012
Achieved 44.88% mean IoU on ADE20k
Outperformed existing models in segmentation accuracy
Abstract
Despite the growing success of Convolution neural networks (CNN) in the recent past in the task of scene segmentation, the standard models lack some of the important features that might result in sub-optimal segmentation outputs. The widely used encoder-decoder architecture extracts and uses several redundant and low-level features at different steps and different scales. Also, these networks fail to map the long-range dependencies of local features, which results in discriminative feature maps corresponding to each semantic class in the resulting segmented image. In this paper, we propose a novel multi-scale attention network for scene segmentation purposes by using the rich contextual information from an image. Different from the original UNet architecture we have used attention gates which take the features from the encoder and the output of the pyramid pool as input and produced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
MethodsDice Loss · Convolution
