DynaRend: Learning 3D Dynamics via Masked Future Rendering for Robotic Manipulation
Jingyi Tian, Le Wang, Sanping Zhou, Sen Wang, Jiayi Li, Gang Hua

TL;DR
DynaRend introduces a novel 3D-aware, dynamics-informed representation learning framework using masked volumetric rendering, enhancing robotic manipulation by jointly capturing geometry, semantics, and dynamics from multi-view RGB-D videos.
Contribution
The paper proposes DynaRend, a unified framework that learns 3D geometry, semantics, and dynamics simultaneously through masked reconstruction and future prediction, improving transferability to manipulation tasks.
Findings
Significant improvement in policy success rates on benchmarks.
Enhanced generalization to environmental changes.
Effective real-world robotic manipulation demonstrated.
Abstract
Learning generalizable robotic manipulation policies remains a key challenge due to the scarcity of diverse real-world training data. While recent approaches have attempted to mitigate this through self-supervised representation learning, most either rely on 2D vision pretraining paradigms such as masked image modeling, which primarily focus on static semantics or scene geometry, or utilize large-scale video prediction models that emphasize 2D dynamics, thus failing to jointly learn the geometry, semantics, and dynamics required for effective manipulation. In this paper, we present DynaRend, a representation learning framework that learns 3D-aware and dynamics-informed triplane features via masked reconstruction and future prediction using differentiable volumetric rendering. By pretraining on multi-view RGB-D video data, DynaRend jointly captures spatial geometry, future dynamics, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
