A Novel Vision Transformer for Camera-LiDAR Fusion based Traffic Object   Segmentation

Toomas Tahves; Junyi Gu; Mauro Bellone; Raivo Sell

arXiv:2501.02858·cs.CV·January 7, 2025

A Novel Vision Transformer for Camera-LiDAR Fusion based Traffic Object Segmentation

Toomas Tahves, Junyi Gu, Mauro Bellone, Raivo Sell

PDF

Open Access

TL;DR

This paper introduces CLFT, a vision transformer-based model that fuses camera and LiDAR data for traffic object segmentation, improving perception in autonomous driving but facing challenges under adverse weather conditions.

Contribution

The paper proposes a novel Camera-LiDAR Fusion Transformer (CLFT) model that enhances traffic object segmentation by integrating multimodal data with transformer architectures.

Findings

01

Effective fusion of camera and LiDAR data improves segmentation accuracy.

02

Model performs well across diverse weather conditions but struggles in darkness and rain.

03

Advances state-of-the-art in multimodal traffic object segmentation.

Abstract

This paper presents Camera-LiDAR Fusion Transformer (CLFT) models for traffic object segmentation, which leverage the fusion of camera and LiDAR data using vision transformers. Building on the methodology of visual transformers that exploit the self-attention mechanism, we extend segmentation capabilities with additional classification options to a diverse class of objects including cyclists, traffic signs, and pedestrians across diverse weather conditions. Despite good performance, the models face challenges under adverse conditions which underscores the need for further optimization to enhance performance in darkness and rain. In summary, the CLFT models offer a compelling solution for autonomous driving perception, advancing the state-of-the-art in multimodal fusion and object segmentation, with ongoing efforts required to address existing limitations and fully harness their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection · Advanced Neural Network Applications · Advanced Measurement and Detection Methods

MethodsAttention Is All You Need · Byte Pair Encoding · Dense Connections · Absolute Position Encodings · Dropout · Linear Layer · Softmax · Adam · Residual Connection · Multi-Head Attention