DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
Jia-Wei Liu, Yan-Pei Cao, Jay Zhangjie Wu, Weijia Mao, Yuchao Gu, Rui, Zhao, Jussi Keppo, Ying Shan, Mike Zheng Shou

TL;DR
DynVideo-E introduces dynamic NeRF for large-scale, human-centric video editing, enabling consistent, controllable edits across long videos with significant motion and view changes by leveraging 3D space representations.
Contribution
It pioneers the use of dynamic NeRF with innovative editing pipeline components for long-range, human-centric video editing with superior performance.
Findings
Outperforms state-of-the-art methods by 50-95% in human preference.
Effectively handles large-scale motion and view changes in videos.
Enables consistent, controllable editing in long, complex videos.
Abstract
Despite recent progress in diffusion-based video editing, existing methods are limited to short-length videos due to the contradiction between long-range consistency and frame-wise editing. Prior attempts to address this challenge by introducing video-2D representations encounter significant difficulties with large-scale motion- and view-change videos, especially in human-centric scenarios. To overcome this, we propose to introduce the dynamic Neural Radiance Fields (NeRF) as the innovative video representation, where the editing can be performed in the 3D spaces and propagated to the entire video via the deformation field. To provide consistent and controllable editing, we propose the image-based video-NeRF editing pipeline with a set of innovative designs, including multi-view multi-pose Score Distillation Sampling (SDS) from both the 2D personalized diffusion prior and 3D diffusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Video Analysis and Summarization
MethodsDiffusion
