TL;DR
VisionLaw is a bilevel optimization framework that infers interpretable intrinsic dynamics of objects from visual observations, leveraging LLMs for constitutive law generation and vision-guided evaluation for improved accuracy.
Contribution
It introduces a novel LLMs-driven decoupled evolution strategy and a vision-guided evaluation mechanism for inferring intrinsic dynamics from visual data.
Findings
Outperforms existing methods on synthetic and real datasets.
Demonstrates strong generalization in novel interactive scenarios.
Effectively infers interpretable and physically plausible intrinsic dynamics.
Abstract
The intrinsic dynamics of an object governs its physical behavior in the real world, playing a critical role in enabling physically plausible interactive simulation with 3D assets. Existing methods have attempted to infer the intrinsic dynamics of objects from visual observations, but generally face two major challenges: one line of work relies on manually defined constitutive priors, making it difficult to align with actual intrinsic dynamics; the other models intrinsic dynamics using neural networks, resulting in limited interpretability and poor generalization. To address these challenges, we propose VisionLaw, a bilevel optimization framework that infers interpretable expressions of intrinsic dynamics from visual observations. At the upper level, we introduce an LLMs-driven decoupled constitutive evolution strategy, where LLMs are prompted to act as physics experts to generate and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
