MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion
Onkar Susladkar, Jishu Sen Gupta, Chirag Sehgal, Sparsh Mittal, Rekha, Singhal

TL;DR
MotionAura introduces a novel framework combining vector-quantized diffusion models, spectral transformers, and a new VAE for high-quality, motion-consistent text-to-video generation and inpainting, advancing spatiotemporal video processing.
Contribution
The paper presents a new 3D VAE for video compression, a vector-quantized diffusion model for text-to-video synthesis, and a spectral transformer for improved denoising, along with a sketch-guided inpainting task.
Findings
Achieves state-of-the-art reconstruction quality.
Produces temporally coherent videos aligned with text prompts.
Demonstrates superior performance on multiple benchmarks.
Abstract
The spatio-temporal complexity of video data presents significant challenges in tasks such as compression, generation, and inpainting. We present four key contributions to address the challenges of spatiotemporal video processing. First, we introduce the 3D Mobile Inverted Vector-Quantization Variational Autoencoder (3D-MBQ-VAE), which combines Variational Autoencoders (VAEs) with masked token modeling to enhance spatiotemporal video compression. The model achieves superior temporal consistency and state-of-the-art (SOTA) reconstruction quality by employing a novel training strategy with full frame masking. Second, we present MotionAura, a text-to-video generation framework that utilizes vector-quantized diffusion models to discretize the latent space and capture complex motion dynamics, producing temporally coherent videos aligned with text prompts. Third, we propose a spectral…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Video Coding and Compression Technologies · Advanced Vision and Imaging
MethodsInpainting · Diffusion
