PropFly: Learning to Propagate via On-the-Fly Supervision from Pre-trained Video Diffusion Models
Wonyong Seo, Jaeho Moon, Jaehyup Lee, Soo Ye Kim, Munchurl Kim

TL;DR
PropFly introduces a novel training pipeline for video propagation-based editing that leverages on-the-fly supervision from pre-trained video diffusion models, eliminating the need for large paired datasets and achieving superior results.
Contribution
It proposes a new on-the-fly supervision method using pre-trained VDMs and a Guidance-Modulated Flow Matching loss for efficient, high-quality video editing propagation.
Findings
Outperforms state-of-the-art methods in video editing tasks
Produces high-quality, temporally consistent editing results
Eliminates need for large paired video datasets
Abstract
Propagation-based video editing enables precise user control by propagating a single edited frame into following frames while maintaining the original context such as motion and structures. However, training such models requires large-scale, paired (source and edited) video datasets, which are costly and complex to acquire. Hence, we propose the PropFly, a training pipeline for Propagation-based video editing, relying on on-the-Fly supervision from pre-trained video diffusion models (VDMs) instead of requiring off-the-shelf or precomputed paired video editing datasets. Specifically, our PropFly leverages one-step clean latent estimations from intermediate noised latents with varying Classifier-Free Guidance (CFG) scales to synthesize diverse pairs of 'source' (low-CFG) and 'edited' (high-CFG) latents on-the-fly. The source latent serves as structural information of the video, while the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Human Pose and Action Recognition
