VisionLaw: Inferring Interpretable Intrinsic Dynamics from Visual Observations via Bilevel Optimization

Jiajing Lin; Shu Jiang; Qingyuan Zeng; Zhenzhong Wang; Min Jiang

arXiv:2508.13792·cs.CV·April 13, 2026

VisionLaw: Inferring Interpretable Intrinsic Dynamics from Visual Observations via Bilevel Optimization

Jiajing Lin, Shu Jiang, Qingyuan Zeng, Zhenzhong Wang, Min Jiang

PDF

1 Video

TL;DR

VisionLaw is a bilevel optimization framework that infers interpretable intrinsic dynamics of objects from visual observations, leveraging LLMs for constitutive law generation and vision-guided evaluation for improved accuracy.

Contribution

It introduces a novel LLMs-driven decoupled evolution strategy and a vision-guided evaluation mechanism for inferring intrinsic dynamics from visual data.

Findings

01

Outperforms existing methods on synthetic and real datasets.

02

Demonstrates strong generalization in novel interactive scenarios.

03

Effectively infers interpretable and physically plausible intrinsic dynamics.

Abstract

The intrinsic dynamics of an object governs its physical behavior in the real world, playing a critical role in enabling physically plausible interactive simulation with 3D assets. Existing methods have attempted to infer the intrinsic dynamics of objects from visual observations, but generally face two major challenges: one line of work relies on manually defined constitutive priors, making it difficult to align with actual intrinsic dynamics; the other models intrinsic dynamics using neural networks, resulting in limited interpretability and poor generalization. To address these challenges, we propose VisionLaw, a bilevel optimization framework that infers interpretable expressions of intrinsic dynamics from visual observations. At the upper level, we introduce an LLMs-driven decoupled constitutive evolution strategy, where LLMs are prompted to act as physics experts to generate and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

VisionLaw: Inferring Interpretable Intrinsic Dynamics from Visual Observations via Bilevel Optimization· slideslive