3D Equivariant Visuomotor Policy Learning via Spherical Projection

Boce Hu; Dian Wang; David Klee; Heng Tian; Xupeng Zhu; Haojie Huang; Robert Platt; Robin Walters

arXiv:2505.16969·cs.RO·October 31, 2025

3D Equivariant Visuomotor Policy Learning via Spherical Projection

Boce Hu, Dian Wang, David Klee, Heng Tian, Xupeng Zhu, Haojie Huang, Robert Platt, Robin Walters

PDF

1 Video

TL;DR

This paper introduces ISP, a novel SO(3)-equivariant visuomotor policy framework that uses spherical projection of RGB images, improving data efficiency and performance in robotic manipulation tasks.

Contribution

It presents the first SO(3)-equivariant policy learning method using only monocular RGB images through spherical projection, bridging the gap in prior work.

Findings

01

Outperforms baselines in simulation and real-world tasks

02

Enhances sample efficiency in visuomotor learning

03

Works effectively with monocular RGB inputs

Abstract

Equivariant models have recently been shown to improve the data efficiency of diffusion policy by a significant margin. However, prior work that explored this direction focused primarily on point cloud inputs generated by multiple cameras fixed in the workspace. This type of point cloud input is not compatible with the now-common setting where the primary input modality is an eye-in-hand RGB camera like a GoPro. This paper closes this gap by incorporating into the diffusion policy model a process that projects features from the 2D RGB camera image onto a sphere. This enables us to reason about symmetries in $SO (3)$ without explicitly reconstructing a point cloud. We perform extensive experiments in both simulation and the real world that demonstrate that our method consistently outperforms strong baselines in terms of both performance and sample efficiency. Our work,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

3D Equivariant Visuomotor Policy Learning via Spherical Projection· slideslive