Hierarchical Patch Diffusion Models for High-Resolution Video Generation

Ivan Skorokhodov; Willi Menapace; Aliaksandr Siarohin; Sergey Tulyakov

arXiv:2406.07792·cs.CV·June 13, 2024

Hierarchical Patch Diffusion Models for High-Resolution Video Generation

Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Sergey Tulyakov

PDF

Open Access

TL;DR

This paper introduces hierarchical patch diffusion models that improve high-resolution video generation by enforcing patch consistency and adaptive computation, achieving state-of-the-art results and enabling efficient end-to-end training on ultra-high resolutions.

Contribution

The paper presents a novel hierarchical patch diffusion architecture with deep context fusion and adaptive computation, significantly enhancing high-resolution video synthesis capabilities.

Findings

01

Achieved a new state-of-the-art FVD score of 66.32 on UCF-101.

02

Surpassed recent methods by more than 100% in Inception Score.

03

First diffusion-based model trained end-to-end on ultra-high-resolution videos.

Abstract

Diffusion models have demonstrated remarkable performance in image and video synthesis. However, scaling them to high-resolution inputs is challenging and requires restructuring the diffusion pipeline into multiple independent components, limiting scalability and complicating downstream applications. This makes it very efficient during training and unlocks end-to-end optimization on high-resolution videos. We improve PDMs in two principled ways. First, to enforce consistency between patches, we develop deep context fusion -- an architectural technique that propagates the context information from low-scale to high-scale patches in a hierarchical manner. Second, to accelerate training and inference, we propose adaptive computation, which allocates more network capacity and computation towards coarse image details. The resulting model sets a new state-of-the-art FVD score of 66.32 and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Coding and Compression Technologies · Computer Graphics and Visualization Techniques · Image and Video Quality Assessment

MethodsBalanced Selection · Diffusion