Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling
Mingtong Zhang, Kaifeng Zhang, Yunzhu Li

TL;DR
This paper presents a novel framework that learns object dynamics directly from multi-view RGB videos using 3D Gaussian representations and graph neural networks, enabling accurate action-conditioned video prediction and robotic manipulation planning.
Contribution
It introduces a 3D Gaussian-based neural dynamics model that explicitly incorporates robot actions and scene dynamics from multi-view videos, improving prediction and planning in robotic applications.
Findings
Successfully models complex deformable object dynamics.
Enables accurate action-conditioned future state prediction.
Demonstrates effectiveness on various deformable materials.
Abstract
Videos of robots interacting with objects encode rich information about the objects' dynamics. However, existing video prediction approaches typically do not explicitly account for the 3D information from videos, such as robot actions and objects' 3D states, limiting their use in real-world robotic applications. In this work, we introduce a framework to learn object dynamics directly from multi-view RGB videos by explicitly considering the robot's action trajectories and their effects on scene dynamics. We utilize the 3D Gaussian representation of 3D Gaussian Splatting (3DGS) to train a particle-based dynamics model using Graph Neural Networks. This model operates on sparse control particles downsampled from the densely tracked 3D Gaussian reconstructions. By learning the neural dynamics model on offline robot interaction data, our method can predict object motions under varying initial…
Peer Reviews
Decision·CoRL 2024
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
