CLFT: Camera-LiDAR Fusion Transformer for Semantic Segmentation in   Autonomous Driving

Junyi Gu; Mauro Bellone; Tom\'a\v{s} Pivo\v{n}ka; and Raivo Sell

arXiv:2404.17793·cs.CV·September 10, 2024

CLFT: Camera-LiDAR Fusion Transformer for Semantic Segmentation in Autonomous Driving

Junyi Gu, Mauro Bellone, Tom\'a\v{s} Pivo\v{n}ka, and Raivo Sell

PDF

Open Access 2 Repos

TL;DR

This paper introduces CLFT, a vision-transformer-based camera-LiDAR fusion network for semantic segmentation in autonomous driving, demonstrating robustness and improved performance in challenging weather conditions.

Contribution

The paper presents a novel progressive-assemble and cross-fusion strategy for vision transformers in multimodal sensor fusion for autonomous driving.

Findings

01

Up to 10% improvement in dark-wet conditions over FCN-based fusion networks.

02

5-10% overall improvement compared to single-modality transformer backbones.

03

Robust performance in rain and low illumination conditions.

Abstract

Critical research about camera-and-LiDAR-based semantic object segmentation for autonomous driving significantly benefited from the recent development of deep learning. Specifically, the vision transformer is the novel ground-breaker that successfully brought the multi-head-attention mechanism to computer vision applications. Therefore, we propose a vision-transformer-based network to carry out camera-LiDAR fusion for semantic segmentation applied to autonomous driving. Our proposal uses the novel progressive-assemble strategy of vision transformers on a double-direction network and then integrates the results in a cross-fusion strategy over the transformer decoder layers. Unlike other works in the literature, our camera-LiDAR fusion transformers have been evaluated in challenging conditions like rain and low illumination, showing robust performance. The paper reports the segmentation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Layer Normalization · Multi-Head Attention · Residual Connection · Softmax · Vision Transformer