TL;DR
SceneTeract is a framework that verifies the functionality of 3D scenes for embodied AI by combining semantic reasoning with geometric checks, revealing gaps in current models and aiding in physical reasoning.
Contribution
It introduces a grounded verification engine for 3D scenes, decomposes activities into atomic actions, and evaluates VLMs' ability to reason about affordances, with a release of tools and data.
Findings
Frequent functional failures in synthetic indoor environments.
Systematic mismatches between semantic confidence and physical feasibility in VLMs.
SceneTeract can be used as a reward engine for VLM post-training.
Abstract
Embodied AI depends on interactive 3D environments that support meaningful activities for diverse users, yet assessing their functional affordances remains a core challenge. We introduce SceneTeract, a framework that verifies 3D scene functionality under agent-specific constraints. Our core contribution is a grounded verification engine that couples high-level semantic reasoning with low-level geometric checks. SceneTeract decomposes complex activities into sequences of atomic actions and validates each step against accessibility requirements (e.g., reachability, clearance, and navigability) conditioned on an embodied agent profile, using explicit physical and geometric simulations. We deploy SceneTeract to perform an in-depth evaluation of (i) synthetic indoor environments, uncovering frequent functional failures that prevent basic interactions, and (ii) the ability of frontier…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
