BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation
Changqian Yu, Changxin Gao, Jingbo Wang, Gang Yu, Chunhua Shen, Nong, Sang

TL;DR
BiSeNet V2 introduces a dual-branch architecture that separately captures low-level details and high-level semantics, enabling real-time semantic segmentation with high accuracy and speed.
Contribution
The paper presents a novel bilateral network architecture with a guided aggregation layer and booster training, achieving superior real-time segmentation performance.
Findings
Achieves 72.6% Mean IoU on Cityscapes test set.
Runs at 156 FPS on a single GPU.
Outperforms existing real-time segmentation methods.
Abstract
The low-level details and high-level semantics are both essential to the semantic segmentation task. However, to speed up the model inference, current approaches almost always sacrifice the low-level details, which leads to a considerable accuracy decrease. We propose to treat these spatial details and categorical semantics separately to achieve high accuracy and high efficiency for realtime semantic segmentation. To this end, we propose an efficient and effective architecture with a good trade-off between speed and accuracy, termed Bilateral Segmentation Network (BiSeNet V2). This architecture involves: (i) a Detail Branch, with wide channels and shallow layers to capture low-level details and generate high-resolution feature representation; (ii) a Semantic Branch, with narrow channels and deep layers to obtain high-level semantic context. The Semantic Branch is lightweight due to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia? · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Batch Normalization · Residual Connection · Depthwise Convolution · 1x1 Convolution · Average Pooling · Convolution · Bilateral Guided Aggregation Layer · BiSeNet V2
