Free-Form Scene Editor: Enabling Multi-Round Object Manipulation like in a 3D Engine
Xincheng Shuai, Zhenyuan Qin, Henghui Ding, Dacheng Tao

TL;DR
FFSE is a novel 3D-aware editing framework that allows intuitive, multi-round object manipulations on real images, maintaining scene consistency and realistic effects, surpassing previous methods in flexibility and quality.
Contribution
We introduce FFSE, a 3D-aware autoregressive framework for multi-round object editing, and the 3DObjectEditor dataset for training and evaluation.
Findings
FFSE outperforms existing methods in 3D-aware editing tasks.
Enables arbitrary object manipulations like translation, scaling, and rotation.
Maintains scene realism and consistency across multiple editing rounds.
Abstract
Recent advances in text-to-image (T2I) diffusion models have significantly improved semantic image editing, yet most methods fall short in performing 3D-aware object manipulation. In this work, we present FFSE, a 3D-aware autoregressive framework designed to enable intuitive, physically-consistent object editing directly on real-world images. Unlike previous approaches that either operate in image space or require slow and error-prone 3D reconstruction, FFSE models editing as a sequence of learned 3D transformations, allowing users to perform arbitrary manipulations, such as translation, scaling, and rotation, while preserving realistic background effects (e.g., shadows, reflections) and maintaining global scene consistency across multiple editing rounds. To support learning of multi-round 3D-aware object manipulation, we introduce 3DObjectEditor, a hybrid dataset constructed from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques
