TB-HSU: Hierarchical 3D Scene Understanding with Contextual Affordances

Wenting Xu; Viorela Ila; Luping Zhou; Craig T. Jin

arXiv:2412.05596·cs.CV·February 25, 2025

TB-HSU: Hierarchical 3D Scene Understanding with Contextual Affordances

Wenting Xu, Viorela Ila, Luping Zhou, Craig T. Jin

PDF

Open Access

TL;DR

This paper introduces a hierarchical 3D scene graph model that captures spatial organization and affordances in indoor scenes, utilizing a transformer-based approach and a new dataset for improved scene understanding.

Contribution

It presents a novel transformer-based method for constructing 3D hierarchical scene graphs with region and object affordances, along with a new dataset for training and evaluation.

Findings

01

Improved performance over state-of-the-art models

02

Effective modeling of spatial and functional scene aspects

03

Public release of code and dataset

Abstract

The concept of function and affordance is a critical aspect of 3D scene understanding and supports task-oriented objectives. In this work, we develop a model that learns to structure and vary functional affordance across a 3D hierarchical scene graph representing the spatial organization of a scene. The varying functional affordance is designed to integrate with the varying spatial context of the graph. More specifically, we develop an algorithm that learns to construct a 3D hierarchical scene graph (3DHSG) that captures the spatial organization of the scene. Starting from segmented object point clouds and object semantic labels, we develop a 3DHSG with a top node that identifies the room label, child nodes that define local spatial regions inside the room with region-specific affordances, and grand-child nodes indicating object locations and object-specific affordances. To support this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · 3D Surveying and Cultural Heritage