TL;DR
This paper introduces Visual Chronometer, a method to estimate the true physical frame rate from video dynamics, addressing temporal inconsistencies in generative models and improving video realism.
Contribution
It proposes a novel predictor for recovering physical frame rate directly from visual motion, along with benchmarks to evaluate temporal stability in video generation.
Findings
State-of-the-art generators suffer from severe PhyFPS misalignment.
Applying PhyFPS correction improves perceived naturalness of AI-generated videos.
The method accurately estimates true temporal scale from motion cues.
Abstract
While recent generative video models have achieved remarkable visual realism and are being explored as world models, true physical simulation requires mastering both space and time. Current models can produce visually smooth kinematics, yet they lack a reliable internal motion pulse to ground these motions in a consistent, real-world time scale. This temporal ambiguity stems from the common practice of indiscriminately training on videos with vastly different real-world speeds, forcing them into standardized frame rates. This leads to what we term chronometric hallucination: generated sequences exhibit ambiguous, unstable, and uncontrollable physical motion speeds. To address this, we propose Visual Chronometer, a predictor that recovers the Physical Frames Per Second (PhyFPS) directly from the visual dynamics of an input video. Trained via controlled temporal resampling, our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
