ReMAP-DP: Reprojected Multi-view Aligned PointMaps for Diffusion Policy
Xinzhang Yang, Renjun Wu, Jinyan Liu, Xuesong Li

TL;DR
ReMAP-DP introduces a novel framework combining perspective reprojection and a dual-stream diffusion policy to enhance 3D spatial awareness in robot manipulation tasks, outperforming existing methods in simulation and real-world scenarios.
Contribution
It presents a new approach that fuses semantic and geometric features using a structure-aware dual-stream diffusion policy with perspective reprojection.
Findings
Achieves 59.3% success rate on RoboTwin 2.0, surpassing DP3 by 6.6%.
Improves performance by 28% on ManiSkill 3 Stack Cube task.
Demonstrates robustness and data efficiency in real-world high-precision manipulations.
Abstract
Generalist robot policies built upon 2D visual representations excel at semantic reasoning but inherently lack the explicit 3D spatial awareness required for high-precision tasks. Existing 3D integration methods struggle to bridge this gap due to the structural irregularity of sparse point clouds and the geometric distortion introduced by multi-view orthographic rendering. To overcome these barriers, we present ReMAP-DP, a novel framework synergizing standardized perspective reprojection with a structure-aware dual-stream diffusion policy. By coupling the re-projected views with pixel-aligned PointMaps, our dual-stream architecture leverages learnable modality embeddings to fuse frozen semantic features and explicit geometric descriptors, ensuring precise implicit patch-level alignment. Extensive experiments across simulation and real-world environments demonstrate ReMAP-DP's superior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Robot Manipulation and Learning
