FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving
Yutao Zhu, Xiaosong Jia, Xinyu Yang, Junchi Yan

TL;DR
FlatFusion is a novel Transformer-based framework for sparse camera-LiDAR fusion in autonomous driving, systematically exploring design choices and outperforming existing methods in accuracy and efficiency.
Contribution
The paper provides a comprehensive analysis of design strategies for sparse Transformer-based sensor fusion and introduces FlatFusion, a new framework that achieves superior performance.
Findings
FlatFusion achieves 73.7 NDS on nuScenes validation set.
It outperforms state-of-the-art sparse Transformer methods.
Operates at 10.1 FPS with PyTorch.
Abstract
The integration of data from diverse sensor modalities (e.g., camera and LiDAR) constitutes a prevalent methodology within the ambit of autonomous driving scenarios. Recent advancements in efficient point cloud transformers have underscored the efficacy of integrating information in sparse formats. When it comes to fusion, since image patches are dense in pixel space with ambiguous depth, it necessitates additional design considerations for effective fusion. In this paper, we conduct a comprehensive exploration of design choices for Transformer-based sparse cameraLiDAR fusion. This investigation encompasses strategies for image-to-3D and LiDAR-to-2D mapping, attention neighbor grouping, single modal tokenizer, and micro-structure of Transformer. By amalgamating the most effective principles uncovered through our investigation, we introduce FlatFusion, a carefully designed framework for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Remote Sensing and LiDAR Applications · Advanced Neural Network Applications
MethodsSparse Evolutionary Training · Linear Layer · Layer Normalization · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings
