FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion Generation

Yiyi Cai; Yuhan Wu; Kunhang Li; You Zhou; Bo Zheng; Haiyang Liu

arXiv:2512.03520·cs.CV·February 9, 2026

FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion Generation

Yiyi Cai, Yuhan Wu, Kunhang Li, You Zhou, Bo Zheng, Haiyang Liu

PDF

Open Access 2 Models

TL;DR

FloodDiffusion introduces a tailored diffusion forcing framework for real-time, text-driven streaming human motion generation, achieving state-of-the-art results by customizing diffusion training and conditioning methods.

Contribution

The paper presents a novel diffusion forcing approach specifically adapted for streaming motion generation, overcoming limitations of vanilla diffusion models.

Findings

01

Achieved an FID of 0.057 on HumanML3D benchmark.

02

Demonstrated the importance of bi-directional attention and time scheduling.

03

First to apply diffusion forcing to streaming human motion generation.

Abstract

We present FloodDiffusion, a new framework for text-driven, streaming human motion generation. Given time-varying text prompts, FloodDiffusion generates text-aligned, seamless motion sequences with real-time latency. Unlike existing methods that rely on chunk-by-chunk or auto-regressive model with diffusion head, we adopt a diffusion forcing framework to model this time-series generation task under time-varying control events. We find that a straightforward implementation of vanilla diffusion forcing (as proposed for video models) fails to model real motion distributions. We demonstrate that to guarantee modeling the output distribution, the vanilla diffusion forcing must be tailored to: (i) train with a bi-directional attention instead of casual attention; (ii) implement a lower triangular time scheduler instead of a random one; (iii) utilize a continues time-varying way to introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis