LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion
Xin Li, Tao Ma, Yuenan Hou, Botian Shi, Yuchen Yang, Youquan Liu,, Xingjiao Wu, Qin Chen, Yikang Li, Yu Qiao, Liang He

TL;DR
LoGoNet introduces a novel local-to-global fusion approach for LiDAR-camera data, significantly improving 3D object detection accuracy by combining fine-grained region-level and scene-level features.
Contribution
The paper proposes LoGoNet, a fusion network that integrates local and global features using point centroids and grid-based image sampling, advancing multi-modal 3D detection.
Findings
Achieves state-of-the-art results on Waymo and KITTI datasets.
Ranks 1st on Waymo 3D detection leaderboard.
Surpasses 80 APH (L2) on three classes simultaneously.
Abstract
LiDAR-camera fusion methods have shown impressive performance in 3D object detection. Recent advanced multi-modal methods mainly perform global fusion, where image features and point cloud features are fused across the whole scene. Such practice lacks fine-grained region-level information, yielding suboptimal fusion performance. In this paper, we present the novel Local-to-Global fusion network (LoGoNet), which performs LiDAR-camera fusion at both local and global levels. Concretely, the Global Fusion (GoF) of LoGoNet is built upon previous literature, while we exclusively use point centroids to more precisely represent the position of voxel features, thus achieving better cross-modal alignment. As to the Local Fusion (LoF), we first divide each proposal into uniform grids and then project these grid centers to the images. The image features around the projected grid points are sampled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Robotics and Sensor-Based Localization
