PhyCo: Learning Controllable Physical Priors for Generative Motion
Sriram Narayanan, Ziyu Jiang, Srinivasa Narasimhan, Manmohan Chandraker

TL;DR
PhyCo is a novel framework that enhances video diffusion models with controllable, physically consistent generation by integrating a large simulation dataset, physics-supervised fine-tuning, and vision-language guided optimization.
Contribution
It introduces a scalable method combining simulation data, physics supervision, and visual language feedback to produce physically realistic and controllable videos without requiring geometry reconstruction.
Findings
PhyCo outperforms baselines on the Physics-IQ benchmark.
Human studies show improved physical realism and control.
The approach generalizes beyond synthetic training environments.
Abstract
Modern video diffusion models excel at appearance synthesis but still struggle with physical consistency: objects drift, collisions lack realistic rebound, and material responses seldom match their underlying properties. We present PhyCo, a framework that introduces continuous, interpretable, and physically grounded control into video generation. Our approach integrates three key components: (i) a large-scale dataset of over 100K photorealistic simulation videos where friction, restitution, deformation, and force are systematically varied across diverse scenarios; (ii) physics-supervised fine-tuning of a pretrained diffusion model using a ControlNet conditioned on pixel-aligned physical property maps; and (iii) VLM-guided reward optimization, where a fine-tuned vision-language model evaluates generated videos with targeted physics queries and provides differentiable feedback. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
