UMI-3D: Extending Universal Manipulation Interface from Vision-Limited to 3D Spatial Perception

Ziming Wang

arXiv:2604.14089·cs.RO·April 16, 2026

UMI-3D: Extending Universal Manipulation Interface from Vision-Limited to 3D Spatial Perception

Ziming Wang

PDF

1 Repo

TL;DR

UMI-3D enhances the Universal Manipulation Interface by integrating LiDAR sensors for robust 3D perception, enabling more reliable data collection and manipulation in complex real-world environments.

Contribution

It introduces a lightweight LiDAR-based extension to UMI, improving data quality and enabling manipulation of challenging objects while maintaining portability.

Findings

01

Achieves high success rates on standard manipulation tasks.

02

Enables learning of complex tasks like deformable and articulated object manipulation.

03

Supports an end-to-end open-source data collection and deployment pipeline.

Abstract

We present UMI-3D, a multimodal extension of the Universal Manipulation Interface (UMI) for robust and scalable data collection in embodied manipulation. While UMI enables portable, wrist-mounted data acquisition, its reliance on monocular visual SLAM makes it vulnerable to occlusions, dynamic scenes, and tracking failures, limiting its applicability in real-world environments. UMI-3D addresses these limitations by introducing a lightweight and low-cost LiDAR sensor tightly integrated into the wrist-mounted interface, enabling LiDAR-centric SLAM with accurate metric-scale pose estimation under challenging conditions. We further develop a hardware-synchronized multimodal sensing pipeline and a unified spatiotemporal calibration framework that aligns visual observations with LiDAR point clouds, producing consistent 3D representations of demonstrations. Despite maintaining the original 2D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://umi-3d.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.