Towards One-step Causal Video Generation via Adversarial Self-Distillation
Yongqi Yang, Huayang Huang, Xu Peng, Xiaobin Hu, Donghao Luo, Jiangning Zhang, Chengjie Wang, Yu Wu

TL;DR
This paper introduces a novel adversarial self-distillation framework for efficient one-step causal video generation, significantly reducing inference time while maintaining high quality, and supporting flexible multi-step generation without re-distillation.
Contribution
The paper proposes a new ASD strategy within the DMD framework, enabling high-quality, extremely few-step video synthesis and a First-Frame Enhancement technique to improve stability and quality.
Findings
Outperforms state-of-the-art in one- and two-step video generation
Supports multiple inference steps with a single distilled model
Achieves high-quality video synthesis with minimal denoising steps
Abstract
Recent hybrid video generation models combine autoregressive temporal dynamics with diffusion-based spatial denoising, but their sequential, iterative nature leads to error accumulation and long inference times. In this work, we propose a distillation-based framework for efficient causal video generation that enables high-quality synthesis with extremely limited denoising steps. Our approach builds upon the Distribution Matching Distillation (DMD) framework and proposes a novel Adversarial Self-Distillation (ASD) strategy, which aligns the outputs of the student model's n-step denoising process with its (n+1)-step version at the distribution level. This design provides smoother supervision by bridging small intra-student gaps and more informative guidance by combining teacher knowledge with locally consistent student behavior, substantially improving training stability and generation…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper is clearly written and easy to follow. The proposed methods (ASD and FFE) are well-motivated and clearly elaborated. 2. The results look very impressive. Compared to the Self-Forcing baseline, the 1-step video generation exhibits a great boost in quality. The speed of 1-step causal generation will enable a wider deployment of streaming video generation.
One minor concern might be conceptual novelty. The main method of this work, ASD, is not fundamentally new [1,2]. Considering the value and impact of 1-step causal video generation, the engineering effort to tune an end-to-end pipeline is a significant contribution, especially that the authors provide the code for replication in the supplementary material. [1] Zhang et al., SF-V: Single Forward Video Generation Model. [2] Lin et al., Autoregressive Adversarial Post-Training for Real-Time I
- The paper is generally well written and presented. - Provided 1 and 2 step video results, both in the paper and supplementary material, show better results than current competitors. - The method supports both single and few / multi step inference which is a major advantage over fixed step trained methods - The observation First Frame Strategy seems to be an important observation, by itself already boosting state of the art results - The influence of ASD and FFE cleanly ablated showing the supe
## Incremental contribution: - The used components are not fundamentally new in nature. DMD is very well established for model distillation and remains a core component also in this work. - Similar adversarial diffusion distillation has been proposed before and is well established in the community - As a such there are no fundamentally new concepts presented, but their combination provides a nice contribution to the current state of research. ## Limited information on experiments: - There are
1. Well-Motivated and Significant Problem: The paper addresses a highly relevant and challenging problem in generative AI: achieving high-fidelity video synthesis under extreme computational constraints (one or two inference steps). The motivation is clear, and a successful solution to this problem would have a significant practical impact, making the research direction valuable. 2. Empirically Effective and Intuitive Core Ideas: a) ASD's Practical Efficacy: The core idea of Adversarial Self-Dis
1. Methodological Foundation of ASD Lacks Rigor: The central contribution, Adversarial Self-Distillation (ASD), is built on a foundation that is more intuitive than it is rigorous. The paper's core claims—that the "intra-student gap" is smaller and that adversarial alignment provides "smoother supervision"—are presented as assertions rather than demonstrated principles. a) There is no formal analysis or empirical measurement to quantify this "gap" (e.g., in terms of a specific distribution diver
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Model Reduction and Neural Networks
