NEWTON: Agentic Planning for Physically Grounded Video Generation

Yuxiang Feng; Juncheng Wang; Chao Xu; Yijie Qian; Huihan Wang; Wenlong Hou; Yang Liu; Baigui Sun; Yong Liu; Shujun Wang

arXiv:2605.18396·cs.CV·May 20, 2026

NEWTON: Agentic Planning for Physically Grounded Video Generation

Yuxiang Feng, Juncheng Wang, Chao Xu, Yijie Qian, Huihan Wang, Wenlong Hou, Yang Liu, Baigui Sun, Yong Liu, Shujun Wang

PDF

1 Repo

TL;DR

NEWTON introduces an agentic planning framework that enhances physically grounded video generation by orchestrating tools and iterative verification, significantly improving physical commonsense accuracy.

Contribution

It proposes a novel planning-based approach with a verifier for physically grounded video generation, addressing the specification bottleneck in prior models.

Findings

01

Improved joint accuracy from 21.4% to 29.7% on LTX-Video.

02

Enhanced accuracy from 30.7% to 37.4% on Veo-3.1.

03

Demonstrated effectiveness without modifying the underlying generator.

Abstract

Video generation models produce visually compelling results but systematically violate physical commonsense -- on VideoPhy-2, the best model achieves only 32.6% joint accuracy. We identify a specification bottleneck: text prompts are lossy compression of the physical world, omitting the parameters that fully determine dynamics, and no amount of model scaling can recover what was never specified. From this diagnosis we derive three properties that physics conditioning must satisfy -- sufficiency, dynamism, and verifiability -- and show that no existing approach satisfies all three. We present NEWTON, in which video generation is demoted from the system output to one action inside an agent's toolbox: a learned planner orchestrates physics-aware tools (keyframe generation, scientific computation, prompt refinement) to construct rich conditioning, and a verifier closes the loop for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://Newton026.github.io/newton
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.