Fast Multi-view Consistent 3D Editing with Video Priors
Liyi Chen, Ruihuang Li, Guowen Zhang, Pengfei Wang, Lei Zhang

TL;DR
This paper introduces ViP3DE, a novel method that leverages pre-trained video generation models to enable fast, multi-view consistent 3D editing from text instructions in a single forward pass, improving quality and efficiency.
Contribution
The paper proposes a new approach using video priors for 3D editing, bypassing iterative methods and incorporating geometry-aware denoising for enhanced multi-view consistency.
Findings
Achieves high-quality 3D editing in a single forward pass.
Significantly outperforms existing methods in editing quality.
Demonstrates faster processing with improved multi-view consistency.
Abstract
Text-driven 3D editing enables user-friendly 3D object or scene editing with text instructions. Due to the lack of multi-view consistency priors, existing methods typically resort to employing 2D generation or editing models to process each view individually, followed by iterative 2D-3D-2D updating. However, these methods are not only time-consuming but also prone to over-smoothed results because the different editing signals gathered from different views are averaged during the iterative process. In this paper, we propose generative Video Prior based 3D Editing (ViP3DE) to employ the temporal consistency priors from pre-trained video generation models for multi-view consistent 3D editing in a single forward pass. Our key insight is to condition the video generation model on a single edited view to generate other consistent edited views for 3D updating directly, thereby bypassing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis
