Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

Yunhong Lu; Yanhong Zeng; Haobo Li; Hao Ouyang; Qiuyu Wang; Ka Leong Cheng; Jiapeng Zhu; Hengyuan Cao; Zhipeng Zhang; Xing Zhu; Yujun Shen; Min Zhang

arXiv:2512.04678·cs.CV·December 30, 2025

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

Yunhong Lu, Yanhong Zeng, Haobo Li, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Jiapeng Zhu, Hengyuan Cao, Zhipeng Zhang, Xing Zhu, Yujun Shen, Min Zhang

PDF

Open Access 1 Models

TL;DR

This paper introduces Reward Forcing, a novel framework for streaming video generation that improves motion dynamics and long-term consistency using EMA-Sink tokens and Rewarded Distribution Matching Distillation, achieving state-of-the-art results.

Contribution

The paper proposes EMA-Sink for better long-term context and introduces Re-DMD to prioritize dynamic content in distillation, enhancing motion quality in streaming video generation.

Findings

01

Achieves state-of-the-art performance on benchmarks.

02

Generates high-quality streaming videos at 23.1 FPS.

03

Effectively maintains long-term consistency and dynamic motion.

Abstract

Efficient streaming video generation is critical for simulating interactive and dynamic worlds. Existing methods distill few-step video diffusion models with sliding window attention, using initial frames as sink tokens to maintain attention performance and reduce error accumulation. However, video frames become overly dependent on these static tokens, resulting in copied initial frames and diminished motion dynamics. To address this, we introduce Reward Forcing, a novel framework with two key designs. First, we propose EMA-Sink, which maintains fixed-size tokens initialized from initial frames and continuously updated by fusing evicted tokens via exponential moving average as they exit the sliding window. Without additional computation cost, EMA-Sink tokens capture both long-term context and recent dynamics, preventing initial frame copying while maintaining long-horizon consistency.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
JaydenLu666/Reward-Forcing-T2V-1.3B
model· ♡ 10
♡ 10

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Face recognition and analysis