ContactGaussian-WM: Learning Physics-Grounded World Model from Videos
Meizhong Wang, Wanxin Jin, Kun Cao, Lihua Xie, Yiguang Hong

TL;DR
ContactGaussian-WM introduces a physics-grounded world model that learns complex physical interactions directly from sparse, contact-rich videos, enhancing robotic planning and simulation capabilities.
Contribution
It presents a novel differentiable physics engine with a Gaussian representation for visual and collision data, enabling learning from limited data and complex contact scenarios.
Findings
Outperforms state-of-the-art methods in simulations and real-world tests.
Demonstrates robust generalization to complex physical scenarios.
Enables practical applications like data synthesis and real-time MPC.
Abstract
Developing world models that understand complex physical interactions is essential for advancing robotic planning and simulation.However, existing methods often struggle to accurately model the environment under conditions of data scarcity and complex contact-rich dynamic motion.To address these challenges, we propose ContactGaussian-WM, a differentiable physics-grounded rigid-body world model capable of learning intricate physical laws directly from sparse and contact-rich video sequences.Our framework consists of two core components: (1) a unified Gaussian representation for both visual appearance and collision geometry, and (2) an end-to-end differentiable learning framework that differentiates through a closed-form physics engine to infer physical properties from sparse visual observations.Extensive simulations and real-world evaluations demonstrate that ContactGaussian-WM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
