ECMNet:Lightweight Semantic Segmentation with Efficient CNN-Mamba Network

Feixiang Du; Shengkun Wu

arXiv:2506.08629·cs.CV·February 6, 2026

ECMNet:Lightweight Semantic Segmentation with Efficient CNN-Mamba Network

Feixiang Du, Shengkun Wu

PDF

Open Access

TL;DR

ECMNet is a lightweight semantic segmentation model that combines CNN and Mamba for improved accuracy and efficiency, utilizing novel attention modules and feature fusion techniques.

Contribution

The paper introduces ECMNet, a novel CNN-Mamba hybrid architecture with specialized attention and fusion modules for enhanced segmentation performance.

Findings

01

Achieves 70.6% mIoU on Cityscapes

02

Achieves 73.6% mIoU on CamVid

03

Uses only 0.87M parameters with high efficiency

Abstract

In the past decade, Convolutional Neural Networks (CNNs) and Transformers have achieved wide applicaiton in semantic segmentation tasks. Although CNNs with Transformer models greatly improve performance, the global context modeling remains inadequate. Recently, Mamba achieved great potential in vision tasks, showing its advantages in modeling long-range dependency. In this paper, we propose a lightweight Efficient CNN-Mamba Network for semantic segmentation, dubbed as ECMNet. ECMNet combines CNN with Mamba skillfully in a capsule-based framework to address their complementary weaknesses. Specifically, We design a Enhanced Dual-Attention Block (EDAB) for lightweight bottleneck. In order to improve the representations ability of feature, We devise a Multi-Scale Attention Unit (MSAU) to integrate multi-scale feature aggregation, spatial aggregation and channel aggregation. Moreover, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · IoT and Edge/Fog Computing · Multimodal Machine Learning Applications

MethodsAbsolute Position Encodings · Layer Normalization · Byte Pair Encoding · Label Smoothing · Softmax · Dropout · Dense Connections · Transformer · Attention Is All You Need · Mamba: Linear-Time Sequence Modeling with Selective State Spaces