Point2Graph: An End-to-end Point Cloud-based 3D Open-Vocabulary Scene   Graph for Robot Navigation

Yifan Xu; Ziming Luo; Qianwei Wang; Vineet Kamat; and Carol Menassa

arXiv:2409.10350·cs.RO·September 17, 2024

Point2Graph: An End-to-end Point Cloud-based 3D Open-Vocabulary Scene Graph for Robot Navigation

Yifan Xu, Ziming Luo, Qianwei Wang, Vineet Kamat, and Carol Menassa

PDF

Open Access

TL;DR

Point2Graph introduces an end-to-end 3D scene graph generation method from point clouds, eliminating the need for RGB-D images or camera poses, and achieves state-of-the-art results in open-vocabulary scene understanding.

Contribution

The paper presents a novel hierarchical framework that combines geometry-based and learning-based methods for open-vocabulary scene graph generation directly from point clouds.

Findings

01

Outperforms current SOTA in open-vocabulary room and object segmentation.

02

Eliminates dependence on RGB-D images and camera pose data.

03

Provides an end-to-end pipeline for 3D scene understanding from point clouds.

Abstract

Current open-vocabulary scene graph generation algorithms highly rely on both 3D scene point cloud data and posed RGB-D images and thus have limited applications in scenarios where RGB-D images or camera poses are not readily available. To solve this problem, we propose Point2Graph, a novel end-to-end point cloud-based 3D open-vocabulary scene graph generation framework in which the requirement of posed RGB-D image series is eliminated. This hierarchical framework contains room and object detection/segmentation and open-vocabulary classification. For the room layer, we leverage the advantage of merging the geometry-based border detection algorithm with the learning-based region detection to segment rooms and create a "Snap-Lookup" framework for open-vocabulary room classification. In addition, we create an end-to-end pipeline for the object layer to detect and classify 3D objects based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Multimodal Machine Learning Applications · Robotic Path Planning Algorithms