Deep Sensor Fusion with Pyramid Fusion Networks for 3D Semantic Segmentation
Hannah Schieber, Fabian Duerr, Torsten Schoen, J\"urgen Beyerer

TL;DR
This paper introduces a pyramid-based deep fusion network that combines camera and lidar data at multiple scales to significantly improve 3D semantic segmentation accuracy in autonomous driving scenarios.
Contribution
The novel Pyramid Fusion Backbone and Head effectively fuse multimodal features at various scales, outperforming existing methods in 3D semantic segmentation.
Findings
Outperforms recent lidar range view approaches
Effective multi-scale multimodal feature fusion
Validated on challenging outdoor datasets
Abstract
Robust environment perception for autonomous vehicles is a tremendous challenge, which makes a diverse sensor set with e.g. camera, lidar and radar crucial. In the process of understanding the recorded sensor data, 3D semantic segmentation plays an important role. Therefore, this work presents a pyramid-based deep fusion architecture for lidar and camera to improve 3D semantic segmentation of traffic scenes. Individual sensor backbones extract feature maps of camera images and lidar point clouds. A novel Pyramid Fusion Backbone fuses these feature maps at different scales and combines the multimodal features in a feature pyramid to compute valuable multimodal, multi-scale features. The Pyramid Fusion Head aggregates these pyramid features and further refines them in a late fusion step, incorporating the final features of the sensor backbones. The approach is evaluated on two challenging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Optical Sensing Technologies · Advanced Neural Network Applications · Remote Sensing and LiDAR Applications
