TB-HSU: Hierarchical 3D Scene Understanding with Contextual Affordances
Wenting Xu, Viorela Ila, Luping Zhou, Craig T. Jin

TL;DR
This paper introduces a hierarchical 3D scene graph model that captures spatial organization and affordances in indoor scenes, utilizing a transformer-based approach and a new dataset for improved scene understanding.
Contribution
It presents a novel transformer-based method for constructing 3D hierarchical scene graphs with region and object affordances, along with a new dataset for training and evaluation.
Findings
Improved performance over state-of-the-art models
Effective modeling of spatial and functional scene aspects
Public release of code and dataset
Abstract
The concept of function and affordance is a critical aspect of 3D scene understanding and supports task-oriented objectives. In this work, we develop a model that learns to structure and vary functional affordance across a 3D hierarchical scene graph representing the spatial organization of a scene. The varying functional affordance is designed to integrate with the varying spatial context of the graph. More specifically, we develop an algorithm that learns to construct a 3D hierarchical scene graph (3DHSG) that captures the spatial organization of the scene. Starting from segmented object point clouds and object semantic labels, we develop a 3DHSG with a top node that identifies the room label, child nodes that define local spatial regions inside the room with region-specific affordances, and grand-child nodes indicating object locations and object-specific affordances. To support this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · 3D Surveying and Cultural Heritage
