P-Flow: Prompting Visual Effects Generation
Rui Zhao, Mike Zheng Shou

TL;DR
P-Flow is a training-free framework that refines text prompts at test time to accurately generate dynamic visual effects in videos, leveraging vision-language models for high-fidelity customization without modifying the underlying generative model.
Contribution
It introduces a novel prompt optimization method for dynamic visual effects in video generation, enabling high-quality customization without retraining the model.
Findings
Outperforms existing methods in visual effect fidelity and diversity
Effective in both text-to-video and image-to-video tasks
Achieves high-quality effects without model modification
Abstract
Recent advancements in video generation models have significantly improved their ability to follow text prompts. However, the customization of dynamic visual effects, defined as temporally evolving and appearance-driven visual phenomena like object crushing or explosion, remains underexplored. Prior works on motion customization or control mainly focus on low-level motions of the subject or camera, which can be guided using explicit control signals such as motion trajectories. In contrast, dynamic visual effects involve higher-level semantics that are more naturally suited for control via text prompts. However, it is hard and time-consuming for humans to craft a single prompt that accurately specifies these effects, as they require complex temporal reasoning and iterative refinement over time. To address this challenge, we propose P-Flow, a novel training-free framework for customizing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Human Motion and Animation
