IRS: Instance-Level 3D Scene Graphs via Room Prior Guided LiDAR-Camera Fusion
Hongming Chen, Yiyang Lin, Ziliang Li, Biyu Ye, Yuying Zhang, Ximin Lyu

TL;DR
This paper introduces a fast, robust method for constructing detailed 3D scene graphs of indoor environments by fusing LiDAR and camera data, leveraging room priors and visual foundation models for improved semantic understanding.
Contribution
It presents a novel framework combining LiDAR-camera fusion with room priors and multi-level VFMs to enhance 3D scene graph construction speed and accuracy in indoor scenes.
Findings
Achieves up to ten times faster scene graph construction than previous methods.
Maintains high semantic precision with improved robustness.
Validates effectiveness through experiments in simulated and real environments.
Abstract
Indoor scene understanding remains a fundamental challenge in robotics, with direct implications for downstream tasks such as navigation and manipulation. Traditional approaches often rely on closed-set recognition or loop closure, limiting their adaptability in open-world environments. With the advent of visual foundation models (VFMs), open-vocabulary recognition and natural language querying have become feasible, unlocking new possibilities for 3D scene graph construction. In this paper, we propose a robust and efficient framework for instance-level 3D scene graph construction via LiDAR-camera fusion. Leveraging LiDAR's wide field of view (FOV) and long-range sensing capabilities, we rapidly acquire room-level geometric priors. Multi-level VFMs are employed to improve the accuracy and consistency of semantic extraction. During instance fusion, room-based segmentation enables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Neural Network Applications · Advanced Vision and Imaging
