Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals
Nate Gillman, Charles Herrmann, Michael Freeman, Daksh Aggarwal, Evan Luo, Deqing Sun, Chen Sun

TL;DR
This paper introduces force prompts for video generation models, enabling realistic simulation of physical interactions like poking and wind, without requiring 3D assets or physics simulators, and demonstrates strong generalization from limited data.
Contribution
The work presents a novel method for incorporating physical force control into video generation models, showing they can learn and generalize physics-based interactions from synthetic data.
Findings
Models can respond realistically to localized and global forces.
High-quality physics interactions are achievable with limited training data.
Visual diversity and text keywords are key to generalization.
Abstract
Recent advances in video generation models have sparked interest in world models capable of simulating realistic environments. While navigation has been well-explored, physically meaningful interactions that mimic real-world forces remain largely understudied. In this work, we investigate using physical forces as a control signal for video generation and propose force prompts which enable users to interact with images through both localized point forces, such as poking a plant, and global wind force fields, such as wind blowing on fabric. We demonstrate that these force prompts can enable videos to respond realistically to physical control signals by leveraging the visual and motion prior in the original pretrained model, without using any 3D asset or physics simulator at inference. The primary challenge of force prompting is the difficulty in obtaining high quality paired force-video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Advanced Vision and Imaging · Control Systems and Identification
MethodsRoIPool · Softmax · RoIAlign
