RGB-only Active 3D Scene Graph Generation for Indoor Mobile Robots

Giorgia Modi; Davide Buoso; Giuseppe Averta; Daniele De Martini

arXiv:2605.18197·cs.RO·May 19, 2026

RGB-only Active 3D Scene Graph Generation for Indoor Mobile Robots

Giorgia Modi, Davide Buoso, Giuseppe Averta, Daniele De Martini

PDF

TL;DR

This paper introduces an RGB-only active 3D scene graph generation framework for indoor robots, enabling semantic mapping without depth sensors and improving object detection through active exploration.

Contribution

It presents a novel, fully visual, active scene graph construction method that unifies perception and planning using only RGB data, applicable to diverse camera setups.

Findings

01

Achieves F1-score parity with depth-based methods on the Replica dataset.

02

Semantic-driven viewpoint selection doubles object detection compared to geometric methods.

03

External RGB views enhance scene understanding without extra exploration cost.

Abstract

Current approaches to 3D scene graph generation rely on dedicated depth sensors, such as LiDAR or RGB-D cameras, for metric 3D reconstruction. This limits deployment to specialized robotic platforms and excludes settings where only RGB cameras are available, such as fixed external infrastructure. Existing pipelines also typically operate on passively collected observation trajectories, rather than selecting viewpoints based on the partially built scene representation, and therefore fail to effectively exploit the semantic and spatial information encoded within the graph during exploration. This paper presents a fully visual framework for the active, incremental construction of 3D scene graphs from RGB input only, addressing both limitations. The proposed approach unifies perception and planning around a shared structured representation that captures object semantics, 3D geometry,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.