VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding

Minchao Jiang; Shunyu Jia; Jiaming Gu; Xiaoyuan Lu; Guangming Zhu; Anqi Dong; Liang Zhang

arXiv:2506.22799·cs.GR·July 1, 2025

VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding

Minchao Jiang, Shunyu Jia, Jiaming Gu, Xiaoyuan Lu, Guangming Zhu, Anqi Dong, Liang Zhang

PDF

Open Access

TL;DR

VoteSplat introduces a novel framework combining Hough voting with 3D Gaussian Splatting, enhancing 3D scene understanding and object localization while reducing training costs in high-dimensional semantic spaces.

Contribution

It integrates Hough voting with 3D Gaussian Splatting and utilizes SAM for instance segmentation, enabling open-vocabulary 3D object localization with lower training costs.

Findings

01

Effective open-vocabulary 3D instance localization

02

Improved 3D point cloud understanding

03

Reduced training costs for semantic mapping

Abstract

3D Gaussian Splatting (3DGS) has become horsepower in high-quality, real-time rendering for novel view synthesis of 3D scenes. However, existing methods focus primarily on geometric and appearance modeling, lacking deeper scene understanding while also incurring high training costs that complicate the originally streamlined differentiable rendering pipeline. To this end, we propose VoteSplat, a novel 3D scene understanding framework that integrates Hough voting with 3DGS. Specifically, Segment Anything Model (SAM) is utilized for instance segmentation, extracting objects, and generating 2D vote maps. We then embed spatial offset vectors into Gaussian primitives. These offsets construct 3D spatial votes by associating them with 2D image votes, while depth distortion constraints refine localization along the depth axis. For open-vocabulary object localization, VoteSplat maps 2D image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization · Advanced Vision and Imaging

MethodsContrastive Language-Image Pre-training · Focus