Fusion4CA: Boosting 3D Object Detection via Comprehensive Image Exploitation
Kang Luo, Xin Chen, Yangyi Xiao, Hesheng Wang

TL;DR
Fusion4CA enhances 3D object detection by fully exploiting RGB data alongside LiDAR, using novel modules for better feature alignment and integration, leading to improved accuracy with minimal additional computational cost.
Contribution
The paper introduces Fusion4CA, a plug-and-play framework that significantly improves 3D detection by leveraging RGB data more effectively than prior methods.
Findings
Achieves 69.7% mAP on nuScenes with only 6 training epochs
Improves baseline performance by 1.2% mAP
Adds minimal 3.48% inference parameters
Abstract
Nowadays, an increasing number of works fuse LiDAR and RGB data in the bird's-eye view (BEV) space for 3D object detection in autonomous driving systems. However, existing methods suffer from over-reliance on the LiDAR branch, with insufficient exploration of RGB information. To tackle this issue, we propose Fusion4CA, which is built upon the classic BEVFusion framework and dedicated to fully exploiting visual input with plug-and-play components. Specifically, a contrastive alignment module is designed to calibrate image features with 3D geometry, and a camera auxiliary branch is introduced to mine RGB information sufficiently during training. For further performance enhancement, we leverage an off-the-shelf cognitive adapter to make the most of pretrained image weights, and integrate a standard coordinate attention module into the fusion stage as a supplementary boost. Experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques
