Magic 1-For-1: Generating One Minute Video Clips within One Minute

Hongwei Yi; Shitong Shao; Tian Ye; Jiantong Zhao; Qingyu Yin; Michael; Lingelbach; Li Yuan; Yonghong Tian; Enze Xie; Daquan Zhou

arXiv:2502.07701·cs.CV·February 18, 2025

Magic 1-For-1: Generating One Minute Video Clips within One Minute

Hongwei Yi, Shitong Shao, Tian Ye, Jiantong Zhao, Qingyu Yin, Michael, Lingelbach, Li Yuan, Yonghong Tian, Enze Xie, Daquan Zhou

PDF

Open Access 1 Repo 1 Video

TL;DR

Magic 1-For-1 introduces an efficient method for generating one-minute video clips in under a minute by decomposing the task into simpler steps and applying various optimization techniques to reduce computational costs.

Contribution

The paper proposes a novel two-step diffusion-based approach for fast text-to-video generation, optimizing memory and inference speed, and demonstrating high-quality, long-duration video synthesis.

Findings

01

Generated 5-second videos in 3 seconds.

02

Produced one-minute videos within one minute.

03

Achieved improved visual quality and motion dynamics.

Abstract

In this technical report, we present Magic 1-For-1 (Magic141), an efficient video generation model with optimized memory consumption and inference latency. The key idea is simple: factorize the text-to-video generation task into two separate easier tasks for diffusion step distillation, namely text-to-image generation and image-to-video generation. We verify that with the same optimization algorithm, the image-to-video task is indeed easier to converge over the text-to-video task. We also explore a bag of optimization tricks to reduce the computational cost of training the image-to-video (I2V) models from three aspects: 1) model convergence speedup by using a multi-modal prior condition injection; 2) inference latency speed up by applying an adversarial step distillation, and 3) inference memory cost optimization with parameter sparsification. With those techniques, we are able to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

da-group-pku/magic-1-for-1
pytorchOfficial

Videos

NVIDIA’s New AI: The Age of Real Time Game Making Is Here!· youtube

Taxonomy

TopicsVideo Analysis and Summarization · Multimedia Communication and Technology · Video Coding and Compression Technologies

MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings