ContactGaussian-WM: Learning Physics-Grounded World Model from Videos

Meizhong Wang; Wanxin Jin; Kun Cao; Lihua Xie; Yiguang Hong

arXiv:2602.11021·cs.RO·February 12, 2026

ContactGaussian-WM: Learning Physics-Grounded World Model from Videos

Meizhong Wang, Wanxin Jin, Kun Cao, Lihua Xie, Yiguang Hong

PDF

Open Access

TL;DR

ContactGaussian-WM introduces a physics-grounded world model that learns complex physical interactions directly from sparse, contact-rich videos, enhancing robotic planning and simulation capabilities.

Contribution

It presents a novel differentiable physics engine with a Gaussian representation for visual and collision data, enabling learning from limited data and complex contact scenarios.

Findings

01

Outperforms state-of-the-art methods in simulations and real-world tests.

02

Demonstrates robust generalization to complex physical scenarios.

03

Enables practical applications like data synthesis and real-time MPC.

Abstract

Developing world models that understand complex physical interactions is essential for advancing robotic planning and simulation.However, existing methods often struggle to accurately model the environment under conditions of data scarcity and complex contact-rich dynamic motion.To address these challenges, we propose ContactGaussian-WM, a differentiable physics-grounded rigid-body world model capable of learning intricate physical laws directly from sparse and contact-rich video sequences.Our framework consists of two core components: (1) a unified Gaussian representation for both visual appearance and collision geometry, and (2) an end-to-end differentiable learning framework that differentiates through a closed-form physics engine to infer physical properties from sparse visual observations.Extensive simulations and real-world evaluations demonstrate that ContactGaussian-WM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications