View-Consistent 3D Scene Editing via Dual-Path Structural Correspondense and Semantic Continuity
Pufan Li, Bi'an Du, Shenghe Zheng, Junyi Yao, Wei Hu

TL;DR
This paper introduces a novel framework for text-driven 3D scene editing that explicitly models cross-view dependencies, ensuring view consistency through dual-path structural and semantic mechanisms, and demonstrates superior performance on a new dataset.
Contribution
It proposes a view-consistent 3D editing method with dual-path structural and semantic cues, addressing cross-view inconsistency more robustly than prior approaches.
Findings
Achieves superior editing performance with consistent multi-view outputs.
Introduces a paired multi-view editing dataset for training and evaluation.
Demonstrates robustness and generalization in complex scene editing.
Abstract
Text-driven 3D scene editing has recently attracted increasing attention. Most existing methods follow a render-edit-optimize pipeline, where multi-view images are rendered from a 3D scene, edited with 2D image editors, and then used to optimize the underlying 3D representation. However, cross-view inconsistency remains a major bottleneck. Although recent methods introduce geometric cues, cross-view interactions, or video priors to mitigate this issue, they still largely rely on inference-time synchronization and thus remain limited in robustness and generalization.In this work, we recast multi-view consistent 3D editing from a distributional perspective: 3D scene editing essentially requires a joint distribution modeling across viewpoints.Based on this insight, we propose a view-consistent 3D editing framework that explicitly introduces cross-view dependencies into the editing process.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
