TRACE: High-Fidelity 3D Scene Editing via Tangible Reconstruction and Geometry-Aligned Contextual Video Masking
Jiyuan Hu, Zechuan Zhang, Zongxin Yang, Yi Yang

TL;DR
TRACE introduces a mesh-guided 3D scene editing framework that combines explicit geometry with video diffusion, enabling detailed, structural, and temporally consistent scene modifications.
Contribution
It presents a novel multi-view 3D-Anchor synthesis method and a geometry-aligned contextual masking approach for high-fidelity, part-level 3D scene editing.
Findings
Outperforms existing methods in editing versatility.
Achieves high structural integrity in scene modifications.
Demonstrates temporally stable, physically-grounded rendering.
Abstract
We present TRACE, a mesh-guided 3DGS editing framework that achieves automated, high-fidelity scene transformation. By anchoring video diffusion with explicit 3D geometry, TRACE uniquely enables fine-grained, part-level manipulatio--such as local pose shifting or component replacemen--while preserving the structural integrity of the central subject, a capability largely absent in existing editing methods. Our approach comprises three key stages: (1) Multi-view 3D-Anchor Synthesis, which leverages a sparse-view editor trained on our MV-TRACE datase--the first multi-view consistent dataset dedicated to scene-coherent object addition and modificatio--to generate spatially consistent 3D-anchors; (2) Tangible Geometry Anchoring (TGA), which ensures precise spatial synchronization between inserted meshes and the 3DGS scene via two-phase registration; and (3) Contextual Video Masking (CVM),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
