Learning 3D Particle-based Simulators from RGB-D Videos
William F. Whitney, Tatiana Lopez-Guevara, Tobias Pfaff, Yulia, Rubanova, Thomas Kipf, Kimberly Stachenfeld, Kelsey R. Allen

TL;DR
This paper introduces Visual Particle Dynamics (VPD), a method that learns 3D particle-based simulators directly from RGB-D videos, enabling realistic scene modeling, editing, and long-term prediction without privileged information.
Contribution
VPD is the first end-to-end framework that learns 3D particle representations, dynamics, and rendering from RGB-D videos without requiring ground truth physics data.
Findings
VPD accurately predicts long-term scene evolution.
VPD enables scene editing and view synthesis.
VPD outperforms existing 2D video prediction models.
Abstract
Realistic simulation is critical for applications ranging from robotics to animation. Traditional analytic simulators sometimes struggle to capture sufficiently realistic simulation which can lead to problems including the well known "sim-to-real" gap in robotics. Learned simulators have emerged as an alternative for better capturing real-world physical dynamics, but require access to privileged ground truth physics information such as precise object geometry or particle tracks. Here we propose a method for learning simulators directly from observations. Visual Particle Dynamics (VPD) jointly learns a latent particle-based representation of 3D scenes, a neural simulator of the latent particle dynamics, and a renderer that can produce images of the scene from arbitrary views. VPD learns end to end from posed RGB-D videos and does not require access to privileged information. Unlike…
Peer Reviews
Decision·ICLR 2024 poster
The motivation and building blocks of this paper are clear and reasonable. The choices of neural simulator and renderer are up-to-date. With a general, unstructured, and differentiable simulator (GNN) and render (NeRF), it is easier to simultaneously train this system end-to-end. Moreover, training the system end-to-end can indeed improve performance. Without carefully handcrafted dynamics and rendering models, this pipeline can reconstruct complex 3D dynamic information from 2.5D input. The l
The method is actually only capturing the surface points. There are no inner points reconstructed with only the depth values. We can also see this effect in the demo videos, where objects seem to be hollow. The algorithm might rely on high-quality and multi-view RGBD videos. In the experiments, the background is clean and the objects are relatively simple. There are not so many tests on real-world data where the depth values are noisy and the view angles are sparse. More like a systematic inte
**Presentation** - The paper is generally easy to follow, and the video results on the website are helpful. **Method** - The point-based representations enable 3D editing capability during/before simulation. - The use of PointNeRF is a nice design choice that couples particle-based simulation and differentiable rendering.
**Writing** - The idea of learning dynamics/physics simulators from videos is not particularly new (e.g., NeRF-dy, [A-B]), but the intro and related work are positioned in a way that appears those works are not relevant. To give the readers sufficient context, I would recommend putting more effort into discussing the existing "video->simulator" works, their limitations, and the key differences in this work. **Efficiency** - One advantage of using a latent representation for simulation learnin
The paper uses a explicit point cloud representation, such that it supports scene editing, novel-view re-rendering. The experimental results have shown the effectienss of the method in such applications.
The claim that existing learned world models are not simulators is not accurate. The function of simulator is to estimate the transition model of the system. The main weakness of the proposed simulator is that it does not support the simulation of the environment response to different actions. Therefore, it can only predict the future state of a passive system. Secondly, the author does not validate the extrapolation ability, which is important for a simulator. Experiments on out-of-distributio
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging
