TL;DR
UMI-3D enhances the Universal Manipulation Interface by integrating LiDAR sensors for robust 3D perception, enabling more reliable data collection and manipulation in complex real-world environments.
Contribution
It introduces a lightweight LiDAR-based extension to UMI, improving data quality and enabling manipulation of challenging objects while maintaining portability.
Findings
Achieves high success rates on standard manipulation tasks.
Enables learning of complex tasks like deformable and articulated object manipulation.
Supports an end-to-end open-source data collection and deployment pipeline.
Abstract
We present UMI-3D, a multimodal extension of the Universal Manipulation Interface (UMI) for robust and scalable data collection in embodied manipulation. While UMI enables portable, wrist-mounted data acquisition, its reliance on monocular visual SLAM makes it vulnerable to occlusions, dynamic scenes, and tracking failures, limiting its applicability in real-world environments. UMI-3D addresses these limitations by introducing a lightweight and low-cost LiDAR sensor tightly integrated into the wrist-mounted interface, enabling LiDAR-centric SLAM with accurate metric-scale pose estimation under challenging conditions. We further develop a hardware-synchronized multimodal sensing pipeline and a unified spatiotemporal calibration framework that aligns visual observations with LiDAR point clouds, producing consistent 3D representations of demonstrations. Despite maintaining the original 2D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
