Deep Multi-Branch Aggregation Network for Real-Time Semantic   Segmentation in Street Scenes

Xi Weng; Yan Yan; Genshun Dong; Chang Shu; Biao Wang; Hanzi Wang; Ji; Zhang

arXiv:2203.04037·cs.CV·March 9, 2022

Deep Multi-Branch Aggregation Network for Real-Time Semantic Segmentation in Street Scenes

Xi Weng, Yan Yan, Genshun Dong, Chang Shu, Biao Wang, Hanzi Wang, Ji, Zhang

PDF

TL;DR

This paper introduces DMA-Net, a novel real-time semantic segmentation network that effectively balances high accuracy and fast inference speed for street scene analysis by leveraging multi-scale feature aggregation and global context modeling.

Contribution

The paper proposes DMA-Net with a multi-branch aggregation decoder and novel modules like lattice enhanced residual blocks, improving segmentation quality without sacrificing speed.

Findings

01

Achieves 77.0% mIoU on Cityscapes at 46.7 FPS

02

Attains 73.6% mIoU on CamVid at 119.8 FPS

03

Demonstrates a superior tradeoff between accuracy and inference speed

Abstract

Real-time semantic segmentation, which aims to achieve high segmentation accuracy at real-time inference speed, has received substantial attention over the past few years. However, many state-of-the-art real-time semantic segmentation methods tend to sacrifice some spatial details or contextual information for fast inference, thus leading to degradation in segmentation quality. In this paper, we propose a novel Deep Multi-branch Aggregation Network (called DMA-Net) based on the encoder-decoder structure to perform real-time semantic segmentation in street scenes. Specifically, we first adopt ResNet-18 as the encoder to efficiently generate various levels of feature maps from different stages of convolutions. Then, we develop a Multi-branch Aggregation Network (MAN) as the decoder to effectively aggregate different levels of feature maps and capture the multi-scale information. In MAN, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Layer Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Softmax · Batch Normalization · Convolution · Residual Connection · Residual Block