Edit-As-Act: Goal-Regressive Planning for Open-Vocabulary 3D Indoor Scene Editing

Seongrae Noh; SeungWon Seo; Gyeong-Moon Park; HyeongYeop Kang

arXiv:2603.17583·cs.CV·March 19, 2026

Edit-As-Act: Goal-Regressive Planning for Open-Vocabulary 3D Indoor Scene Editing

Seongrae Noh, SeungWon Seo, Gyeong-Moon Park, HyeongYeop Kang

PDF

Open Access

TL;DR

This paper introduces Edit-As-Act, a goal-regressive planning framework for open-vocabulary 3D indoor scene editing that ensures minimal, physically consistent modifications aligned with natural language instructions.

Contribution

It proposes a novel planning-based approach using symbolic goal predicates and a PDDL-inspired language to improve scene editing fidelity and physical plausibility.

Findings

01

Outperforms prior methods on E2A-Bench across all tasks

02

Achieves high instruction fidelity and semantic consistency

03

Ensures physically coherent scene transformations

Abstract

Editing a 3D indoor scene from natural language is conceptually straightforward but technically challenging. Existing open-vocabulary systems often regenerate large portions of a scene or rely on image-space edits that disrupt spatial structure, resulting in unintended global changes or physically inconsistent layouts. These limitations stem from treating editing primarily as a generative task. We take a different view. A user instruction defines a desired world state, and editing should be the minimal sequence of actions that makes this state true while preserving everything else. This perspective motivates Edit-As-Act, a framework that performs open-vocabulary scene editing as goal-regressive planning in 3D space. Given a source scene and free-form instruction, Edit-As-Act predicts symbolic goal predicates and plans in EditLang, a PDDL-inspired action language that we design with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Motion and Animation · Social Robot Interaction and HRI