PropFly: Learning to Propagate via On-the-Fly Supervision from Pre-trained Video Diffusion Models

Wonyong Seo; Jaeho Moon; Jaehyup Lee; Soo Ye Kim; Munchurl Kim

arXiv:2602.20583·cs.CV·February 25, 2026

PropFly: Learning to Propagate via On-the-Fly Supervision from Pre-trained Video Diffusion Models

Wonyong Seo, Jaeho Moon, Jaehyup Lee, Soo Ye Kim, Munchurl Kim

PDF

Open Access

TL;DR

PropFly introduces a novel training pipeline for video propagation-based editing that leverages on-the-fly supervision from pre-trained video diffusion models, eliminating the need for large paired datasets and achieving superior results.

Contribution

It proposes a new on-the-fly supervision method using pre-trained VDMs and a Guidance-Modulated Flow Matching loss for efficient, high-quality video editing propagation.

Findings

01

Outperforms state-of-the-art methods in video editing tasks

02

Produces high-quality, temporally consistent editing results

03

Eliminates need for large paired video datasets

Abstract

Propagation-based video editing enables precise user control by propagating a single edited frame into following frames while maintaining the original context such as motion and structures. However, training such models requires large-scale, paired (source and edited) video datasets, which are costly and complex to acquire. Hence, we propose the PropFly, a training pipeline for Propagation-based video editing, relying on on-the-Fly supervision from pre-trained video diffusion models (VDMs) instead of requiring off-the-shelf or precomputed paired video editing datasets. Specifically, our PropFly leverages one-step clean latent estimations from intermediate noised latents with varying Classifier-Free Guidance (CFG) scales to synthesize diverse pairs of 'source' (low-CFG) and 'edited' (high-CFG) latents on-the-fly. The source latent serves as structural information of the video, while the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Human Pose and Action Recognition