Objects in Generated Videos Are Slower Than They Appear: Models Suffer Sub-Earth Gravity and Don't Know Galileo's Principle...for now
Varun Varma Thozhiyoor, Shivam Tripathi, Venkatesh Babu Radhakrishnan, Anand Bhattad

TL;DR
This paper investigates how current video generation models fail to accurately represent gravity, revealing violations of physical principles like Galileo's, and proposes a minimal-data adaptation to improve their physical realism.
Contribution
The study introduces a unit-free protocol to rigorously test physical laws in generated videos and demonstrates a simple fine-tuning method to partially correct gravity representation.
Findings
Video generators show slower-than-actual gravity effects.
Temporal rescaling cannot fix gravity artifacts.
A lightweight adaptor improves gravity accuracy from 1.81 to 6.43 m/s^2.
Abstract
Video generators are increasingly evaluated as potential world models, which requires them to encode and understand physical laws. We investigate their representation of a fundamental law: gravity. Out-of-the-box video generators consistently generate objects falling at an effectively slower acceleration. However, these physical tests are often confounded by ambiguous metric scale. We first investigate if observed physical errors are artifacts of these ambiguities (e.g., incorrect frame rate assumptions). We find that even temporal rescaling cannot correct the high-variance gravity artifacts. To rigorously isolate the underlying physical representation from these confounds, we introduce a unit-free, two-object protocol that tests the timing ratio , a relationship independent of , focal length, and scale. This relative test reveals violations of Galileo's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis · Visual perception and processing mechanisms
