VolumeDP: Modeling Volumetric Representation for Manipulation Policy Learning

Tianxing Zhou; Feiyang Xue; Zhangchen Ye; Tianyuan Yuan; Hang Zhao; Tao Jiang

arXiv:2603.17720·cs.RO·March 19, 2026

VolumeDP: Modeling Volumetric Representation for Manipulation Policy Learning

Tianxing Zhou, Feiyang Xue, Zhangchen Ye, Tianyuan Yuan, Hang Zhao, Tao Jiang

PDF

Open Access

TL;DR

VolumeDP introduces a 3D-aware policy architecture for robotic manipulation that explicitly reasons in volumetric space, significantly improving success rates and robustness over existing 2D-to-3D mapping methods.

Contribution

The paper proposes VolumeDP, a novel volumetric representation-based policy architecture that enhances spatial reasoning and robustness in robotic manipulation tasks.

Findings

01

Achieves 88.8% success rate on LIBERO benchmark, outperforming baselines by 14.8%.

02

Demonstrates superior performance on ManiSkill and LIBERO-Plus benchmarks.

03

Shows improved real-world generalization to new spatial layouts and viewpoints.

Abstract

Imitation learning is a prominent paradigm for robotic manipulation. However, existing visual imitation methods map 2D image observations directly to 3D action outputs, imposing a 2D-3D mismatch that hinders spatial reasoning and degrades robustness. We present VolumeDP, a policy architecture that restores spatial alignment by explicitly reasoning in 3D. VolumeDP first lifts image features into a Volumetric Representation via cross-attention. It then selects task-relevant voxels with a learnable module and converts them into a compact set of spatial tokens, markedly reducing computation while preserving action-critical geometry. Finally, a multi-token decoder conditions on the entire token set to predict actions, thereby avoiding lossy aggregation that collapses multiple spatial tokens into a single descriptor. VolumeDP achieves a state-of-the-art average success rate of 88.8% on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics