SceneTeract: Agentic Functional Affordances and VLM Grounding in 3D Scenes

L\'eopold Maillard; Francis Engelmann; Tom Durand; Boxiao Pan; Yang You; Or Litany; Leonidas Guibas; Maks Ovsjanikov

arXiv:2603.29798·cs.CV·April 1, 2026

SceneTeract: Agentic Functional Affordances and VLM Grounding in 3D Scenes

L\'eopold Maillard, Francis Engelmann, Tom Durand, Boxiao Pan, Yang You, Or Litany, Leonidas Guibas, Maks Ovsjanikov

PDF

1 Repo

TL;DR

SceneTeract is a framework that verifies the functionality of 3D scenes for embodied AI by combining semantic reasoning with geometric checks, revealing gaps in current models and aiding in physical reasoning.

Contribution

It introduces a grounded verification engine for 3D scenes, decomposes activities into atomic actions, and evaluates VLMs' ability to reason about affordances, with a release of tools and data.

Findings

01

Frequent functional failures in synthetic indoor environments.

02

Systematic mismatches between semantic confidence and physical feasibility in VLMs.

03

SceneTeract can be used as a reward engine for VLM post-training.

Abstract

Embodied AI depends on interactive 3D environments that support meaningful activities for diverse users, yet assessing their functional affordances remains a core challenge. We introduce SceneTeract, a framework that verifies 3D scene functionality under agent-specific constraints. Our core contribution is a grounded verification engine that couples high-level semantic reasoning with low-level geometric checks. SceneTeract decomposes complex activities into sequences of atomic actions and validates each step against accessibility requirements (e.g., reachability, clearance, and navigability) conditioned on an embodied agent profile, using explicit physical and geometric simulations. We deploy SceneTeract to perform an in-depth evaluation of (i) synthetic indoor environments, uncovering frequent functional failures that prevent basic interactions, and (ii) the ability of frontier…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leopoldmaillard/sceneteract
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.