TL;DR
This paper introduces a semi-automatic framework for constructing comprehensive 3D scene graphs that integrate diverse semantic information across objects, rooms, and cameras within a building, leveraging existing detection methods and multi-view consistency.
Contribution
It proposes a novel semi-automatic method to generate 3D scene graphs by combining 2D detection enhancements and multi-view consistency, reducing manual effort.
Findings
Successfully constructs 3D scene graphs with diverse semantics.
Enhances detection accuracy through multi-view consistency.
Reduces manual labor in scene graph creation.
Abstract
A comprehensive semantic understanding of a scene is important for many applications - but in what space should diverse semantic information (e.g., objects, scene categories, material types, texture, etc.) be grounded and what should be its structure? Aspiring to have one unified structure that hosts diverse types of semantics, we follow the Scene Graph paradigm in 3D, generating a 3D Scene Graph. Given a 3D mesh and registered panoramic images, we construct a graph that spans the entire building and includes semantics on objects (e.g., class, material, and other attributes), rooms (e.g., scene category, volume, etc.) and cameras (e.g., location, etc.), as well as the relationships among these entities. However, this process is prohibitively labor heavy if done manually. To alleviate this we devise a semi-automatic framework that employs existing detection methods and enhances them…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
