Multi-Sem Fusion: Multimodal Semantic Fusion for 3D Object Detection

Shaoqing Xu; Fang Li; Ziying Song; Jin Fang; Sifen Wang; Zhi-Xin Yang

arXiv:2212.05265·cs.CV·June 21, 2023·1 cites

Multi-Sem Fusion: Multimodal Semantic Fusion for 3D Object Detection

Shaoqing Xu, Fang Li, Ziying Song, Jin Fang, Sifen Wang, Zhi-Xin Yang

PDF

Open Access

TL;DR

This paper introduces Multi-Sem Fusion, a multimodal semantic fusion framework that combines 2D image and 3D point cloud data with adaptive attention and deep feature fusion to significantly improve 3D object detection accuracy in autonomous driving.

Contribution

The paper proposes a novel multi-modal fusion framework with adaptive attention and deep feature fusion, addressing misalignment issues and enhancing detection performance over existing methods.

Findings

01

Achieves state-of-the-art results on nuScenes benchmark.

02

Significantly outperforms methods using only point clouds or 2D images.

03

Demonstrates improved detection accuracy through semantic and deep feature fusion.

Abstract

LiDAR and camera fusion techniques are promising for achieving 3D object detection in autonomous driving. Most multi-modal 3D object detection frameworks integrate semantic knowledge from 2D images into 3D LiDAR point clouds to enhance detection accuracy. Nevertheless, the restricted resolution of 2D feature maps impedes accurate re-projection and often induces a pronounced boundary-blurring effect, which is primarily attributed to erroneous semantic segmentation. To well handle this limitation, we propose a general multi-modal fusion framework Multi-Sem Fusion (MSF) to fuse the semantic information from both the 2D image and 3D points scene parsing results. Specifically, we employ 2D/3D semantic segmentation methods to generate the parsing results for 2D images and 3D point clouds. The 2D semantic information is further reprojected into the 3D point clouds with calibration parameters.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Domain Adaptation and Few-Shot Learning