Cross-CBAM: A Lightweight network for Scene Segmentation
Zhengbin Zhang, Zhenhao Xu, Xingsheng Gu, Juan Xiong

TL;DR
The paper introduces Cross-CBAM, a lightweight real-time scene segmentation network that combines novel attention modules and multiscale pooling to achieve high accuracy and speed on edge devices.
Contribution
It proposes the SE-ASPP and CCBAM modules, enabling efficient multiscale feature extraction and feature fusion with cross-attention for improved real-time segmentation.
Findings
Achieves 73.4% mIoU at 240.9FPS on Cityscapes
Attains 77.2% mIoU at 88.6FPS on Cityscapes with GTX 1080Ti
Demonstrates a favorable accuracy-speed trade-off on benchmark datasets
Abstract
Scene parsing is a great challenge for real-time semantic segmentation. Although traditional semantic segmentation networks have made remarkable leap-forwards in semantic accuracy, the performance of inference speed is unsatisfactory. Meanwhile, this progress is achieved with fairly large networks and powerful computational resources. However, it is difficult to run extremely large models on edge computing devices with limited computing power, which poses a huge challenge to the real-time semantic segmentation tasks. In this paper, we present the Cross-CBAM network, a novel lightweight network for real-time semantic segmentation. Specifically, a Squeeze-and-Excitation Atrous Spatial Pyramid Pooling Module(SE-ASPP) is proposed to get variable field-of-view and multiscale information. And we propose a Cross Convolutional Block Attention Module(CCBAM), in which a cross-multiply operation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Advanced Neural Network Applications
MethodsTest · 1x1 Convolution · Convolution · Feature Pyramid Network · Spatial Pyramid Pooling · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Focus
