LTX-Video: Realtime Video Latent Diffusion
Yoav HaCohen, Nisan Chiprut, Benny Brazowski, Daniel Shalem, Dudu, Moshe, Eitan Richardson, Eran Levin, Guy Shiran, Nir Zabari, Ori Gordon,, Poriya Panet, Sapir Weissbuch, Victor Kulikov, Yaki Bitterman, Zeev Melumian,, Ofir Bibi

TL;DR
LTX-Video is a transformer-based latent diffusion model that efficiently generates high-resolution, temporally consistent videos in real-time by integrating a high-compression Video-VAE with a denoising transformer.
Contribution
The paper introduces a novel holistic approach that combines Video-VAE and denoising transformer into a unified model, enabling faster and higher-quality video generation.
Findings
Achieves 5 seconds of 24 fps video at 768x512 resolution in 2 seconds.
Supports both text-to-video and image-to-video generation.
Outperforms existing models of similar scale in speed and quality.
Abstract
We introduce LTX-Video, a transformer-based latent diffusion model that adopts a holistic approach to video generation by seamlessly integrating the responsibilities of the Video-VAE and the denoising transformer. Unlike existing methods, which treat these components as independent, LTX-Video aims to optimize their interaction for improved efficiency and quality. At its core is a carefully designed Video-VAE that achieves a high compression ratio of 1:192, with spatiotemporal downscaling of 32 x 32 x 8 pixels per token, enabled by relocating the patchifying operation from the transformer's input to the VAE's input. Operating in this highly compressed latent space enables the transformer to efficiently perform full spatiotemporal self-attention, which is essential for generating high-resolution videos with temporal consistency. However, the high compression inherently limits the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis · Advanced Data Compression Techniques
MethodsLatent Diffusion Model · Diffusion
