MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection

Zewei Lin; Yanqing Shen; Sanping Zhou; Shitao Chen; Nanning Zheng

arXiv:2307.09155·cs.CV·July 19, 2023·1 cites

MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection

Zewei Lin, Yanqing Shen, Sanping Zhou, Shitao Chen, Nanning Zheng

PDF

Open Access

TL;DR

This paper introduces MLF-DET, a multi-level fusion network for cross-modal 3D object detection that combines feature and decision-level fusion, along with data augmentation, achieving state-of-the-art results on KITTI.

Contribution

The paper presents a novel fusion network with multi-scale voxel-image fusion and confidence rectification, along with a new data augmentation strategy for improved 3D detection.

Findings

01

Achieves 82.89% moderate AP on KITTI car benchmark

02

Outperforms existing methods without additional bells and whistles

03

Demonstrates effectiveness of multi-level fusion and data augmentation

Abstract

In this paper, we propose a novel and effective Multi-Level Fusion network, named as MLF-DET, for high-performance cross-modal 3D object DETection, which integrates both the feature-level fusion and decision-level fusion to fully utilize the information in the image. For the feature-level fusion, we present the Multi-scale Voxel Image fusion (MVI) module, which densely aligns multi-scale voxel features with image features. For the decision-level fusion, we propose the lightweight Feature-cued Confidence Rectification (FCR) module which further exploits image semantics to rectify the confidence of detection candidates. Besides, we design an effective data augmentation strategy termed Occlusion-aware GT Sampling (OGS) to reserve more sampled objects in the training scenes, so as to reduce overfitting. Extensive experiments on the KITTI dataset demonstrate the effectiveness of our method.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques