Multi-Scale Feature Aggregation by Cross-Scale Pixel-to-Region Relation Operation for Semantic Segmentation
Yechao Bai, Ziyuan Huang, Lyuyu Shen, Hongliang Guo, Marcelo H. Ang Jr, and Daniela Rus

TL;DR
This paper introduces a novel cross-scale pixel-to-region relation operation with specialized modules for semantic segmentation, improving context aggregation and efficiency over traditional methods.
Contribution
It proposes the Relational Semantics Extractor and Propagator modules for enhanced multi-scale feature aggregation in semantic segmentation.
Findings
Outperforms DeeplabV3 by 0.7% in accuracy.
Achieves 75% fewer FLOPs than baseline.
Demonstrates effectiveness on Cityscapes and COCO datasets.
Abstract
Exploiting multi-scale features has shown great potential in tackling semantic segmentation problems. The aggregation is commonly done with sum or concatenation (concat) followed by convolutional (conv) layers. However, it fully passes down the high-level context to the following hierarchy without considering their interrelation. In this work, we aim to enable the low-level feature to aggregate the complementary context from adjacent high-level feature maps by a cross-scale pixel-to-region relation operation. We leverage cross-scale context propagation to make the long-range dependency capturable even by the high-resolution low-level features. To this end, we employ an efficient feature pyramid network to obtain multi-scale features. We propose a Relational Semantics Extractor (RSE) and Relational Semantics Propagator (RSP) for context extraction and propagation respectively. Then we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSpatial Pyramid Pooling · Batch Normalization · Atrous Spatial Pyramid Pooling · 1x1 Convolution · Dilated Convolution · DeepLabv3
