3DEgo: 3D Editing on the Go!
Umar Khalid, Hasan Iqbal, Azib Farooq, Jing Hua, Chen Chen

TL;DR
3DEgo presents a streamlined method for photorealistic 3D scene synthesis from monocular videos guided by text prompts, eliminating traditional multi-stage processes and enhancing editing consistency and speed.
Contribution
It introduces a single-stage workflow that bypasses COLMAP, using diffusion models and 3D Gaussian Splatting for efficient, high-quality 3D scene editing from videos.
Findings
Achieves high editing precision and speed
Works across diverse video datasets
Eliminates need for model fine-tuning
Abstract
We introduce 3DEgo to address a novel problem of directly synthesizing photorealistic 3D scenes from monocular videos guided by textual prompts. Conventional methods construct a text-conditioned 3D scene through a three-stage process, involving pose estimation using Structure-from-Motion (SfM) libraries like COLMAP, initializing the 3D model with unedited images, and iteratively updating the dataset with edited images to achieve a 3D scene with text fidelity. Our framework streamlines the conventional multi-stage 3D editing process into a single-stage workflow by overcoming the reliance on COLMAP and eliminating the cost of model initialization. We apply a diffusion model to edit video frames prior to 3D scene creation by incorporating our designed noise blender module for enhancing multi-view editing consistency, a step that does not require additional training or fine-tuning of T2I…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Modeling in Geospatial Applications · 3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques
MethodsDiffusion
