PocketDP3: Efficient Pocket-Scale 3D Visuomotor Policy

Jinhao Zhang; Zhexuan Zhou; Huizhe Li; Yichen Lai; Wenlong Xia; Haoming Song; Youmin Gong; Jie Mei

arXiv:2601.22018·cs.RO·February 2, 2026

PocketDP3: Efficient Pocket-Scale 3D Visuomotor Policy

Jinhao Zhang, Zhexuan Zhou, Huizhe Li, Yichen Lai, Wenlong Xia, Haoming Song, Youmin Gong, Jie Mei

PDF

Open Access

TL;DR

PocketDP3 introduces a lightweight 3D diffusion policy with a novel MLP-Mixer based architecture, achieving state-of-the-art robotic manipulation performance with significantly fewer parameters and faster inference, suitable for real-time applications.

Contribution

The paper proposes a compact 3D diffusion policy architecture replacing heavy decoders with a lightweight Diffusion Mixer, enabling efficient, real-time robotic manipulation.

Findings

01

Achieves state-of-the-art results on three benchmarks.

02

Uses less than 1% of parameters compared to prior methods.

03

Supports two-step inference without performance loss.

Abstract

Recently, 3D vision-based diffusion policies have shown strong capability in learning complex robotic manipulation skills. However, a common architectural mismatch exists in these models: a tiny yet efficient point-cloud encoder is often paired with a massive decoder. Given a compact scene representation, we argue that this may lead to substantial parameter waste in the decoder. Motivated by this observation, we propose PocketDP3, a pocket-scale 3D diffusion policy that replaces the heavy conditional U-Net decoder used in prior methods with a lightweight Diffusion Mixer (DiM) built on MLP-Mixer blocks. This architecture enables efficient fusion across temporal and channel dimensions, significantly reducing model size. Notably, without any additional consistency distillation techniques, our method supports two-step inference without sacrificing performance, improving practicality for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Generative Adversarial Networks and Image Synthesis