CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic   Segmentation of Driving Scenes

Danial Qashqai; Emad Mousavian; Shahriar Baradaran Shokouhi; Sattar; Mirzakuchaki

arXiv:2407.01328·cs.CV·July 2, 2024

CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving Scenes

Danial Qashqai, Emad Mousavian, Shahriar Baradaran Shokouhi, Sattar, Mirzakuchaki

PDF

Open Access 1 Repo

TL;DR

CSFNet is a real-time multimodal semantic segmentation model for driving scenes that fuses features using cosine similarity, achieving high accuracy and speed with low computational cost.

Contribution

The paper introduces CSFNet, a novel fusion network utilizing cosine similarity for efficient, real-time RGB-X semantic segmentation in autonomous driving.

Findings

01

Achieves state-of-the-art speed among multimodal segmentation models.

02

Maintains competitive accuracy with fewer parameters.

03

Demonstrates effectiveness on multiple datasets.

Abstract

Semantic segmentation, as a crucial component of complex visual interpretation, plays a fundamental role in autonomous vehicle vision systems. Recent studies have significantly improved the accuracy of semantic segmentation by exploiting complementary information and developing multimodal methods. Despite the gains in accuracy, multimodal semantic segmentation methods suffer from high computational complexity and low inference speed. Therefore, it is a challenging task to implement multimodal methods in driving applications. To address this problem, we propose the Cosine Similarity Fusion Network (CSFNet) as a real-time RGB-X semantic segmentation model. Specifically, we design a Cosine Similarity Attention Fusion Module (CS-AFM) that effectively rectifies and fuses features of two modalities. The CS-AFM module leverages cross-modal similarity to achieve high generalization ability. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Danial-Qashqai/CSFNet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Advanced Image and Video Retrieval Techniques

MethodsSoftmax · Attention Is All You Need · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings