Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion
Linzhan Mou, Jun-Kun Chen, Yu-Xiong Wang

TL;DR
This paper introduces Instruct 4D-to-4D, a method that enables consistent and detailed editing of 4D scenes by treating them as pseudo-3D scenes, extending 2D diffusion models to handle spatial-temporal data.
Contribution
It proposes a novel approach to 4D scene editing by decoupling the problem into temporal consistency and pseudo-3D editing, enhancing 2D diffusion models with new modules and techniques.
Findings
Achieves spatially and temporally consistent 4D scene editing.
Enhances detail and sharpness over previous methods.
Applicable to both monocular and multi-camera scenes.
Abstract
This paper proposes Instruct 4D-to-4D that achieves 4D awareness and spatial-temporal consistency for 2D diffusion models to generate high-quality instruction-guided dynamic scene editing results. Traditional applications of 2D diffusion models in dynamic scene editing often result in inconsistency, primarily due to their inherent frame-by-frame editing methodology. Addressing the complexities of extending instruction-guided editing to 4D, our key insight is to treat a 4D scene as a pseudo-3D scene, decoupled into two sub-problems: achieving temporal consistency in video editing and applying these edits to the pseudo-3D scene. Following this, we first enhance the Instruct-Pix2Pix (IP2P) model with an anchor-aware attention module for batch processing and consistent editing. Additionally, we integrate optical flow-guided appearance propagation in a sliding window fashion for more precise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Modeling in Geospatial Applications
MethodsDiffusion
