R3D: Revisiting 3D Policy Learning

Zhengdong Hong; Shenrui Wu; Haozhe Cui; Boyi Zhao; Ran Ji; Yiyang He; Hangxing Zhang; Zundong Ke; Jun Wang; Guofeng Zhang; Jiayuan Gu

arXiv:2604.15281·cs.CV·April 17, 2026

R3D: Revisiting 3D Policy Learning

Zhengdong Hong, Shenrui Wu, Haozhe Cui, Boyi Zhao, Ran Ji, Yiyang He, Hangxing Zhang, Zundong Ke, Jun Wang, Guofeng Zhang, Jiayuan Gu

PDF

1 Repo 1 Models 1 Datasets

TL;DR

This paper introduces R3D, a new scalable transformer-based 3D policy learning architecture that improves stability and performance in manipulation tasks by addressing training issues and leveraging large-scale pre-training.

Contribution

The work proposes a novel architecture combining a transformer-based 3D encoder with a diffusion decoder, specifically designed for stability and large-scale pre-training in 3D policy learning.

Findings

01

Outperforms state-of-the-art 3D baselines on manipulation benchmarks

02

Identifies key issues like lack of 3D data augmentation and Batch Normalization effects

03

Establishes a new robust foundation for scalable 3D imitation learning

Abstract

3D policy learning promises superior generalization and cross-embodiment transfer, but progress has been hindered by training instabilities and severe overfitting, precluding the adoption of powerful 3D perception models. In this work, we systematically diagnose these failures, identifying the omission of 3D data augmentation and the adverse effects of Batch Normalization as primary causes. We propose a new architecture coupling a scalable transformer-based 3D encoder with a diffusion decoder, engineered specifically for stability at scale and designed to leverage large-scale pre-training. Our approach significantly outperforms state-of-the-art 3D baselines on challenging manipulation benchmarks, establishing a new and robust foundation for scalable 3D imitation learning. Project Page: https://r3d-policy.github.io/

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://r3d-policy.github.io
github

Models

🤗
eddie-cui/r3d-weights
model

Datasets

eddie-cui/r3d
dataset· 2.7k dl
2.7k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.