Minute-Long Videos with Dual Parallelisms

Zeqing Wang; Bowen Zheng; Xingyi Yang; Zhenxiong Tan; Yuecong Xu; Xinchao Wang

arXiv:2505.21070·cs.CV·May 30, 2025

Minute-Long Videos with Dual Parallelisms

Zeqing Wang, Bowen Zheng, Xingyi Yang, Zhenxiong Tan, Yuecong Xu, Xinchao Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces DualParal, a distributed inference method for diffusion transformer video models that parallelizes across both temporal frames and model layers, significantly reducing latency and memory costs for long video generation.

Contribution

We propose a novel distributed inference strategy that enables efficient, asynchronous, and scalable long video generation by parallelizing across frames and layers with synchronization and feature caching.

Findings

01

Achieves 6.54× lower latency on 8×RTX 4090 GPUs.

02

Reduces memory cost by 1.48× while maintaining quality.

03

Enables generation of 1025-frame videos with artifact-free quality.

Abstract

Diffusion Transformer (DiT)-based video diffusion models generate high-quality videos at scale but incur prohibitive processing latency and memory costs for long videos. To address this, we propose a novel distributed inference strategy, termed DualParal. The core idea is that, instead of generating an entire video on a single GPU, we parallelize both temporal frames and model layers across GPUs. However, a naive implementation of this division faces a key limitation: since diffusion models require synchronized noise levels across frames, this implementation leads to the serialization of original parallelisms. We leverage a block-wise denoising scheme to handle this. Namely, we process a sequence of frame blocks through the pipeline with progressively decreasing noise levels. Each GPU handles a specific block and layer subset while passing previous results to the next GPU, enabling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dualparal-project/dualparal
pytorchOfficial

Videos

Minute-Long Videos with Dual Parallelisms· underline

Taxonomy

TopicsImage and Video Quality Assessment · Generative Adversarial Networks and Image Synthesis · Video Coding and Compression Technologies

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Multi-Head Attention · Layer Normalization · Byte Pair Encoding