MotionAura: Generating High-Quality and Motion Consistent Videos using   Discrete Diffusion

Onkar Susladkar; Jishu Sen Gupta; Chirag Sehgal; Sparsh Mittal; Rekha; Singhal

arXiv:2410.07659·cs.CV·March 12, 2025

MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion

Onkar Susladkar, Jishu Sen Gupta, Chirag Sehgal, Sparsh Mittal, Rekha, Singhal

PDF

Open Access 1 Repo

TL;DR

MotionAura introduces a novel framework combining vector-quantized diffusion models, spectral transformers, and a new VAE for high-quality, motion-consistent text-to-video generation and inpainting, advancing spatiotemporal video processing.

Contribution

The paper presents a new 3D VAE for video compression, a vector-quantized diffusion model for text-to-video synthesis, and a spectral transformer for improved denoising, along with a sketch-guided inpainting task.

Findings

01

Achieves state-of-the-art reconstruction quality.

02

Produces temporally coherent videos aligned with text prompts.

03

Demonstrates superior performance on multiple benchmarks.

Abstract

The spatio-temporal complexity of video data presents significant challenges in tasks such as compression, generation, and inpainting. We present four key contributions to address the challenges of spatiotemporal video processing. First, we introduce the 3D Mobile Inverted Vector-Quantization Variational Autoencoder (3D-MBQ-VAE), which combines Variational Autoencoders (VAEs) with masked token modeling to enhance spatiotemporal video compression. The model achieves superior temporal consistency and state-of-the-art (SOTA) reconstruction quality by employing a novel training strategy with full frame masking. Second, we present MotionAura, a text-to-video generation framework that utilizes vector-quantized diffusion models to discretize the latent space and capture complex motion dynamics, producing temporally coherent videos aligned with text prompts. Third, we propose a spectral…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CandleLabAI/MotionAura-ICLR-2025
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Video Coding and Compression Technologies · Advanced Vision and Imaging

MethodsInpainting · Diffusion