Improved Semantic Stixels via Multimodal Sensor Fusion
Florian Piewak, Peter Pinggera, Markus Enzweiler, David Pfeiffer,, Marius Z\"ollner

TL;DR
This paper introduces a multimodal sensor fusion approach that combines LiDAR and camera data to produce a compact, accurate, and semantically rich 3D scene representation with reduced data size and computational cost.
Contribution
It extends the Stixel model to incorporate multimodal data, improving geometric and semantic accuracy with minimal additional computation.
Findings
Enhanced 3D scene representation accuracy
Reduced data size compared to single modality
Maintained low computational overhead
Abstract
This paper presents a compact and accurate representation of 3D scenes that are observed by a LiDAR sensor and a monocular camera. The proposed method is based on the well-established Stixel model originally developed for stereo vision applications. We extend this Stixel concept to incorporate data from multiple sensor modalities. The resulting mid-level fusion scheme takes full advantage of the geometric accuracy of LiDAR measurements as well as the high resolution and semantic detail of RGB images. The obtained environment model provides a geometrically and semantically consistent representation of the 3D scene at a significantly reduced amount of data while minimizing information loss at the same time. Since the different sensor modalities are considered as input to a joint optimization problem, the solution is obtained with only minor computational overhead. We demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Advanced Neural Network Applications
