Cross-CBAM: A Lightweight network for Scene Segmentation

Zhengbin Zhang; Zhenhao Xu; Xingsheng Gu; Juan Xiong

arXiv:2306.02306·cs.CV·June 6, 2023·1 cites

Cross-CBAM: A Lightweight network for Scene Segmentation

Zhengbin Zhang, Zhenhao Xu, Xingsheng Gu, Juan Xiong

PDF

Open Access

TL;DR

The paper introduces Cross-CBAM, a lightweight real-time scene segmentation network that combines novel attention modules and multiscale pooling to achieve high accuracy and speed on edge devices.

Contribution

It proposes the SE-ASPP and CCBAM modules, enabling efficient multiscale feature extraction and feature fusion with cross-attention for improved real-time segmentation.

Findings

01

Achieves 73.4% mIoU at 240.9FPS on Cityscapes

02

Attains 77.2% mIoU at 88.6FPS on Cityscapes with GTX 1080Ti

03

Demonstrates a favorable accuracy-speed trade-off on benchmark datasets

Abstract

Scene parsing is a great challenge for real-time semantic segmentation. Although traditional semantic segmentation networks have made remarkable leap-forwards in semantic accuracy, the performance of inference speed is unsatisfactory. Meanwhile, this progress is achieved with fairly large networks and powerful computational resources. However, it is difficult to run extremely large models on edge computing devices with limited computing power, which poses a huge challenge to the real-time semantic segmentation tasks. In this paper, we present the Cross-CBAM network, a novel lightweight network for real-time semantic segmentation. Specifically, a Squeeze-and-Excitation Atrous Spatial Pyramid Pooling Module(SE-ASPP) is proposed to get variable field-of-view and multiscale information. And we propose a Cross Convolutional Block Attention Module(CCBAM), in which a cross-multiply operation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Advanced Neural Network Applications

MethodsTest · 1x1 Convolution · Convolution · Feature Pyramid Network · Spatial Pyramid Pooling · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Focus