Voxel Field Fusion for 3D Object Detection
Yanwei Li, Xiaojuan Qi, Yukang Chen, Liwei Wang, Zeming Li, Jian Sun,, Jiaya Jia

TL;DR
This paper introduces voxel field fusion, a simple yet effective framework for cross-modality 3D object detection that fuses image features in a voxel grid to improve detection accuracy across datasets.
Contribution
The paper proposes a novel voxel field fusion framework with a learnable sampler and ray-wise fusion to enhance cross-modality 3D detection performance.
Findings
Achieves consistent improvements on KITTI and nuScenes datasets.
Outperforms previous fusion-based methods in 3D object detection.
Demonstrates the effectiveness of voxel field fusion in maintaining modality consistency.
Abstract
In this work, we present a conceptually simple yet effective framework for cross-modality 3D object detection, named voxel field fusion. The proposed approach aims to maintain cross-modality consistency by representing and fusing augmented image features as a ray in the voxel field. To this end, the learnable sampler is first designed to sample vital features from the image plane that are projected to the voxel grid in a point-to-ray manner, which maintains the consistency in feature representation with spatial context. In addition, ray-wise fusion is conducted to fuse features with the supplemental context in the constructed voxel field. We further develop mixed augmentor to align feature-variant transformations, which bridges the modality gap in data augmentation. The proposed framework is demonstrated to achieve consistent gains in various benchmarks and outperforms previous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques
MethodsALIGN
