GVSynergy-Det: Synergistic Gaussian-Voxel Representations for Multi-View 3D Object Detection
Yi Zhang, Yi Wang, Lei Yao, Lap-Pui Chau

TL;DR
GVSynergy-Det introduces a dual-representation framework combining Gaussian and voxel features to improve multi-view 3D object detection accuracy without dense 3D supervision, leveraging complementary geometric information.
Contribution
The paper proposes a novel synergistic Gaussian-voxel representation learning framework that enhances 3D detection accuracy without requiring dense 3D supervision.
Findings
Achieves state-of-the-art results on ScanNetV2 and ARKitScenes datasets.
Operates effectively without dense 3D supervision or point cloud data.
Demonstrates the effectiveness of dual-representation synergy in geometric feature extraction.
Abstract
Image-based 3D object detection aims to identify and localize objects in 3D space using only RGB images, eliminating the need for expensive depth sensors required by point cloud-based methods. Existing image-based approaches face two critical challenges: methods achieving high accuracy typically require dense 3D supervision, while those operating without such supervision struggle to extract accurate geometry from images alone. In this paper, we present GVSynergy-Det, a novel framework that enhances 3D detection through synergistic Gaussian-Voxel representation learning. Our key insight is that continuous Gaussian and discrete voxel representations capture complementary geometric information: Gaussians excel at modeling fine-grained surface details while voxels provide structured spatial context. We introduce a dual-representation architecture that: 1) adapts generalizable Gaussian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · 3D Shape Modeling and Analysis
