RGB-only Active 3D Scene Graph Generation for Indoor Mobile Robots
Giorgia Modi, Davide Buoso, Giuseppe Averta, Daniele De Martini

TL;DR
This paper introduces an RGB-only active 3D scene graph generation framework for indoor robots, enabling semantic mapping without depth sensors and improving object detection through active exploration.
Contribution
It presents a novel, fully visual, active scene graph construction method that unifies perception and planning using only RGB data, applicable to diverse camera setups.
Findings
Achieves F1-score parity with depth-based methods on the Replica dataset.
Semantic-driven viewpoint selection doubles object detection compared to geometric methods.
External RGB views enhance scene understanding without extra exploration cost.
Abstract
Current approaches to 3D scene graph generation rely on dedicated depth sensors, such as LiDAR or RGB-D cameras, for metric 3D reconstruction. This limits deployment to specialized robotic platforms and excludes settings where only RGB cameras are available, such as fixed external infrastructure. Existing pipelines also typically operate on passively collected observation trajectories, rather than selecting viewpoints based on the partially built scene representation, and therefore fail to effectively exploit the semantic and spatial information encoded within the graph during exploration. This paper presents a fully visual framework for the active, incremental construction of 3D scene graphs from RGB input only, addressing both limitations. The proposed approach unifies perception and planning around a shared structured representation that captures object semantics, 3D geometry,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
