Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation

Ruibin Li; Tao Yang; Fangzhou Ai; Tianhe Wu; Shilei Wen; Bingyue Peng; Lei Zhang

arXiv:2604.10103·cs.CV·April 29, 2026

Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation

Ruibin Li, Tao Yang, Fangzhou Ai, Tianhe Wu, Shilei Wen, Bingyue Peng, Lei Zhang

PDF

1 Repo

TL;DR

This paper introduces Hybrid Forcing, a novel approach combining lightweight linear temporal and block-sparse attention with decoupled distillation to enable real-time, long-horizon streaming video generation with state-of-the-art quality.

Contribution

It proposes a hybrid attention mechanism and a tailored distillation strategy to improve long-range dependency modeling and computational efficiency in streaming video generation.

Findings

01

Achieves real-time 832x480 video at 29.5 FPS on a single GPU.

02

Outperforms existing methods on short- and long-form video benchmarks.

03

Maintains long-range dependencies with negligible overhead.

Abstract

Streaming video generation (SVG) distills a pretrained bidirectional video diffusion model into an autoregressive model equipped with sliding window attention (SWA). However, SWA inevitably loses distant history during long video generation, and its computational overhead remains a critical challenge to real-time deployment. In this work, we propose Hybrid Forcing, which jointly optimizes temporal information retention and computational efficiency through a hybrid attention design. First, we introduce lightweight linear temporal attention to preserve long-range dependencies beyond the sliding window. In particular, we maintain a compact key-value state to incrementally absorb evicted tokens, retaining temporal context with negligible memory and computational overhead. Second, we incorporate block-sparse attention into the local sliding window to reduce redundant computation within…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leeruibin/hybrid-forcing
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.