MCFNet: Multi-scale Covariance Feature Fusion Network for Real-time   Semantic Segmentation

Xiaojie Fang; Xingguo Song; Xiangyin Meng; Xu Fang; Sheng Jin

arXiv:2312.07207·cs.CV·December 13, 2023·1 cites

MCFNet: Multi-scale Covariance Feature Fusion Network for Real-time Semantic Segmentation

Xiaojie Fang, Xingguo Song, Xiangyin Meng, Xu Fang, Sheng Jin

PDF

Open Access

TL;DR

This paper introduces MCFNet, a real-time semantic segmentation network that effectively fuses multi-scale features to recover spatial details and improve accuracy, achieving high speed and competitive performance.

Contribution

The paper proposes a novel multi-scale feature fusion architecture with a new refinement module and gating unit, enhancing spatial detail recovery in real-time segmentation.

Findings

01

Achieves 75.5% mIOU on Cityscapes dataset.

02

Runs at 151.3 FPS, demonstrating real-time capability.

03

Outperforms several state-of-the-art methods in accuracy and speed.

Abstract

The low-level spatial detail information and high-level semantic abstract information are both essential to the semantic segmentation task. The features extracted by the deep network can obtain rich semantic information, while a lot of spatial information is lost. However, how to recover spatial detail information effectively and fuse it with high-level semantics has not been well addressed so far. In this paper, we propose a new architecture based on Bilateral Segmentation Network (BiseNet) called Multi-scale Covariance Feature Fusion Network (MCFNet). Specifically, this network introduces a new feature refinement module and a new feature fusion module. Furthermore, a gating unit named L-Gate is proposed to filter out invalid information and fuse multi-scale features. We evaluate our proposed model on Cityscapes, CamVid datasets and compare it with the state-of-the-art methods.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings