Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation
Tao Feng, Xianbing Zhao, Zhenhua Chen, Tien Tsin Wong, Hamid Rezatofighi, Gholamreza Haffari, Lizhen Qu

TL;DR
This paper presents a framework combining symbolic regression and trajectory-guided image-to-video models to generate physically accurate future object trajectories, improving the realism of generated videos by adhering to physical laws.
Contribution
It introduces a novel method that discovers equations of motion from videos and guides video generation without fine-tuning, enhancing physical realism in generated videos.
Findings
Successfully recovers ground-truth equations in classical mechanics scenarios.
Improves physical alignment of generated videos over baseline methods.
Demonstrates effectiveness on spring-mass, pendulums, and projectile motions.
Abstract
Recent advances in diffusion-based and autoregressive video generation models have achieved remarkable visual realism. However, these models typically lack accurate physical alignment, failing to replicate real-world dynamics in object motion. This limitation arises primarily from their reliance on learned statistical correlations rather than capturing mechanisms adhering to physical laws. To address this issue, we introduce a novel framework that integrates symbolic regression (SR) and trajectory-guided image-to-video (I2V) models for physics-grounded video forecasting. Our approach extracts motion trajectories from input videos, uses a retrieval-based pre-training mechanism to enhance symbolic regression, and discovers equations of motion to forecast physically accurate future trajectories. These trajectories then guide video generation without requiring fine-tuning of existing…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The core concept of connecting video analysis, symbolic equation discovery, and generative video models into a single pipeline is a novel approach to address the lack of physical realism in video generation. 2. An intermediant representation of the system is an explicit symbolic equation. This more explainable than black-box neural network predictors and provides genuine insight into the system's dynamics.
1. the method is exclusively evaluated on simple, 2D classical mechanics problems (springs, pendulums) in controlled, static-background lab settings. There is no evidence it can scale to complex, real-world 3D scenarios involving camera motion, occlusions, or non-trivial dynamics. 2. the ReSR method's success is predicated on the assumption that an equation similar to the true governing law already exists in its curated bank. This approach will not generalize to discovering novel physics or comp
1. Novel neuro-symbolic integration for video generation. The paper presents an innovative approach combining interpretable symbolic equation discovery with data-driven video generation models, bridging two typically separate domains. The ReSR method with retrieval-based pre-training from a physics equation bank is a well-motivated contribution that significantly improves convergence speed (44.3 iterations vs 61.4 for PySR) and equation accuracy (TED 0.80 vs 0.47) as shown in Table 1. 2. Compreh
1. Equation bank construction relies heavily on manual curation. Section 3.3 describes constructing the equation bank by manually adapting 106 Feynman equations through time-variable substitution where "time-dependent variables (e.g., velocity, acceleration, momentum) with the time variable t" and "variables that are independent of time (e.g., mass, density) are replaced with constant values (e.g., 10)." 2. The trajectory extraction pipeline uses heuristic selection. The paper extracts traject
- A better symbolic regression method. The authors analyze the drawbacks of previous symbolic regression methods, such as PySR and LaSR, including large search spaces and slow convergence, and propose ReSR. This method is based on a carefully designed, pre-defined equation bank rather than random candidate equations, which improves convergence speed and achieves superior results, significantly outperforming the baselines. - It demonstrates the performance of trajectory-guided video generation
- The ability to generate physically-aligned videos without ground truth should be proven. In this work, input videos are necessary to obtain analytical equations and predict future changes. This method cannot generate motion-consistent trajectories based solely on the initial frame, which limits its application scope. - The method's extensibility has yet to be demonstrated. In the introduction, the authors mention that "insights into object motion in classical mechanics can be easily extende
1. Clear pipeline; inference-only control ensures the improvements stem from the motion law rather than re-training the generator. The paper reports successful recovery of ground-truth equations and improved global motion consistency. 2. The overview and per-module descriptions are easy to follow; classical-mechanics scope and assumptions are explicit.
1. Experiments are limited to controlled lab videos of simple systems (single-object classical mechanics). It’s unclear how the method handles multi-object scenes, contacts/collisions, or non-rigid effects beyond the chosen examples. 2. Tracking noise/occlusions can corrupt trajectories; sensitivity of SR to noisy, short observations is not analyzed. (Method description assumes clean tracks.) 3. The attached showcase videos are not convincing to me. The quality of the video is low and some objec
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation
