Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient   Video Latent Generation

Chenyu Wang; Shuo Yan; Yixuan Chen; Yujiang Wang; Mingzhi Dong,; Xiaochen Yang; Dongsheng Li; Robert P. Dick; Qin Lv; Fan Yang; Tun Lu; Ning; Gu; Li Shang

arXiv:2409.12532·cs.CV·September 20, 2024

Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation

Chenyu Wang, Shuo Yan, Yixuan Chen, Yujiang Wang, Mingzhi Dong,, Xiaochen Yang, Dongsheng Li, Robert P. Dick, Qin Lv, Fan Yang, Tun Lu, Ning, Gu, Li Shang

PDF

Open Access

TL;DR

This paper introduces Dr. Mo, a method that accelerates diffusion-based video generation by leveraging inter-frame motion consistency to reduce redundant computations, while maintaining high visual quality.

Contribution

The paper proposes a novel approach that propagates coarse noises across frames using motion cues, combined with a meta-network to adaptively select denoising steps, improving efficiency and quality.

Findings

01

Significantly faster video generation with maintained quality.

02

Effective motion-based noise propagation reduces computational redundancy.

03

Adaptive step selection balances efficiency and visual fidelity.

Abstract

Video generation using diffusion-based models is constrained by high computational costs due to the frame-wise iterative diffusion process. This work presents a Diffusion Reuse MOtion (Dr. Mo) network to accelerate latent video generation. Our key discovery is that coarse-grained noises in earlier denoising steps have demonstrated high motion consistency across consecutive video frames. Following this observation, Dr. Mo propagates those coarse-grained noises onto the next frame by incorporating carefully designed, lightweight inter-frame motions, eliminating massive computational redundancy in frame-wise diffusion models. The more sensitive and fine-grained noises are still acquired via later denoising steps, which can be essential to retain visual qualities. As such, deciding which intermediate steps should switch from motion-based propagations to denoising can be a crucial problem…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Video Analysis and Summarization · Human Pose and Action Recognition

MethodsDiffusion