TL;DR
Jenga is a novel inference pipeline that significantly accelerates video diffusion models by dynamically reducing attention complexity and resolution during generation, enabling practical, high-quality video synthesis without retraining.
Contribution
It introduces a dynamic attention carving and progressive resolution strategy that drastically speeds up video diffusion without sacrificing quality, all as a plug-and-play method.
Findings
Achieves 8.83× speedup with minimal quality loss.
Maintains comparable generation quality to state-of-the-art models.
Reduces inference time from minutes to seconds.
Abstract
Despite the remarkable generation quality of video Diffusion Transformer (DiT) models, their practical deployment is severely hindered by extensive computational requirements. This inefficiency stems from two key challenges: the quadratic complexity of self-attention with respect to token length and the multi-step nature of diffusion models. To address these limitations, we present Jenga, a novel inference pipeline that combines dynamic attention carving with progressive resolution generation. Our approach leverages two key insights: (1) early denoising steps do not require high-resolution latents, and (2) later steps do not require dense attention. Jenga introduces a block-wise attention mechanism that dynamically selects relevant token interactions using 3D space-filling curves, alongside a progressive resolution strategy that gradually increases latent resolution during generation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Dense Connections · Softmax · Diffusion · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Residual Connection
