Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Shanchuan Lin; Ceyuan Yang; Hao He; Jianwen Jiang; Yuxi Ren; Xin Xia; Yang Zhao; Xuefeng Xiao; Lu Jiang

arXiv:2506.09350·cs.CV·October 3, 2025

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Shanchuan Lin, Ceyuan Yang, Hao He, Jianwen Jiang, Yuxi Ren, Xin Xia, Yang Zhao, Xuefeng Xiao, Lu Jiang

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel autoregressive adversarial post-training method that transforms large-scale video diffusion models into efficient, real-time, interactive video generators capable of streaming high-resolution videos at 24fps.

Contribution

The paper presents a new training paradigm combining autoregressive and adversarial techniques to enable real-time, interactive video generation from pre-trained diffusion models.

Findings

01

Achieves 24fps streaming at 736x416 resolution on a single H100.

02

Supports up to 1440 frames (about a minute) at 1280x720 resolution on 8 H100 GPUs.

03

Reduces error accumulation in long video generation through student-forcing training.

Abstract

Existing large-scale video generation models are computationally intensive, preventing adoption in real-time and interactive applications. In this work, we propose autoregressive adversarial post-training (AAPT) to transform a pre-trained latent video diffusion model into a real-time, interactive video generator. Our model autoregressively generates a latent frame at a time using a single neural function evaluation (1NFE). The model can stream the result to the user in real time and receive interactive responses as controls to generate the next latent frame. Unlike existing approaches, our method explores adversarial training as an effective paradigm for autoregressive generation. This not only allows us to design an architecture that is more efficient for one-step generation while fully utilizing the KV cache, but also enables training the model in a student-forcing manner that proves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Human Pose and Action Recognition