CMF-IoU: Multi-Stage Cross-Modal Fusion 3D Object Detection with IoU Joint Prediction
Zhiwei Ning, Zhaojiang Liu, Xuanang Gao, Yifan Zuo, Jie Yang, Yuming Fang, Wei Liu

TL;DR
This paper presents CMF-IOU, a multi-stage cross-modal fusion framework for 3D object detection that effectively aligns 3D spatial and 2D semantic information, leading to improved accuracy across multiple datasets.
Contribution
The paper introduces a novel multi-stage fusion framework with a joint IoU prediction branch, unifying LiDAR and camera data for enhanced 3D detection performance.
Findings
Superior performance on KITTI, nuScenes, and Waymo datasets.
Effective alignment of 3D spatial and 2D semantic features.
Improved bounding box accuracy with IoU joint prediction.
Abstract
Multi-modal methods based on camera and LiDAR sensors have garnered significant attention in the field of 3D detection. However, many prevalent works focus on single or partial stage fusion, leading to insufficient feature extraction and suboptimal performance. In this paper, we introduce a multi-stage cross-modal fusion 3D detection framework, termed CMF-IOU, to effectively address the challenge of aligning 3D spatial and 2D semantic information. Specifically, we first project the pixel information into 3D space via a depth completion network to get the pseudo points, which unifies the representation of the LiDAR and camera information. Then, a bilateral cross-view enhancement 3D backbone is designed to encode LiDAR points and pseudo points. The first sparse-to-distant (S2D) branch utilizes an encoder-decoder structure to reinforce the representation of sparse LiDAR points. The second…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · 3D Surveying and Cultural Heritage
