TL;DR
CoreNet introduces a novel multi-modality fusion approach for 3D object detection that effectively resolves point-pixel misalignment and sub-task suppression, significantly improving detection accuracy on the nuScenes dataset.
Contribution
The paper proposes CoreNet, a conflict resolution network with dual-branch transformations and task-specific predictors to address key conflicts in multi-modality 3D detection.
Findings
Achieves 75.6% NDS and 73.3% mAP on nuScenes test set.
Effectively resolves point-pixel misalignment and sub-task suppression.
Demonstrates superior performance over existing methods.
Abstract
Fusing multi-modality inputs from different sensors is an effective way to improve the performance of 3D object detection. However, current methods overlook two important conflicts: point-pixel misalignment and sub-task suppression. The former means a pixel feature from the opaque object is projected to multiple point features of the same ray in the world space, and the latter means the classification prediction and bounding box regression may cause mutual suppression. In this paper, we propose a novel method named Conflict Resolution Network (CoreNet) to address the aforementioned issues. Specifically, we first propose a dual-stream transformation module to tackle point-pixel misalignment. It consists of ray-based and point-based 2D-to-BEV transformations. Both of them achieve approximately unique mapping from the image space to the world space. Moreover, we introduce a task-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training
