FROSS: Faster-than-Real-Time Online 3D Semantic Scene Graph Generation from RGB-D Images

Hao-Yu Hou; Chun-Yi Lee; Motoharu Sonogashira; Yasutomo Kawanishi

arXiv:2507.19993·cs.CV·August 12, 2025

FROSS: Faster-than-Real-Time Online 3D Semantic Scene Graph Generation from RGB-D Images

Hao-Yu Hou, Chun-Yi Lee, Motoharu Sonogashira, Yasutomo Kawanishi

PDF

Open Access

TL;DR

FROSS introduces a novel online method for 3D semantic scene graph generation from RGB-D images that operates faster than real-time, reducing computational demands and enabling real-world applications.

Contribution

The paper presents FROSS, a new approach that lifts 2D scene graphs to 3D using object Gaussian representations, and extends the Replica dataset with relationship annotations for evaluation.

Findings

01

FROSS achieves faster-than-real-time performance on 3D scene graph tasks.

02

It outperforms previous methods in accuracy and efficiency.

03

The extended ReplicaSSG dataset enables comprehensive benchmarking.

Abstract

The ability to abstract complex 3D environments into simplified and structured representations is crucial across various domains. 3D semantic scene graphs (SSGs) achieve this by representing objects as nodes and their interrelationships as edges, facilitating high-level scene understanding. Existing methods for 3D SSG generation, however, face significant challenges, including high computational demands and non-incremental processing that hinder their suitability for real-time open-world applications. To address this issue, we propose FROSS (Faster-than-Real-Time Online 3D Semantic Scene Graph Generation), an innovative approach for online and faster-than-real-time 3D SSG generation that leverages the direct lifting of 2D scene graphs to 3D space and represents objects as 3D Gaussian distributions. This framework eliminates the dependency on precise and computationally-intensive point…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Human Motion and Animation · Robotics and Sensor-Based Localization