TacUMI: A Multi-Modal Universal Manipulation Interface for Contact-Rich Tasks
Tailai Cheng, Kejia Chen, Lingyun Chen, Liding Zhang, Yue Zhang, Yao Ling, Mahdi Hamad, Zhenshan Bing, Fan Wu, Karan Sharma, Alois Knoll

TL;DR
TacUMI is a multi-modal data collection and segmentation system that enhances understanding of complex contact-rich manipulation tasks by integrating various sensors and leveraging temporal models for accurate event boundary detection.
Contribution
The paper introduces TacUMI, a novel multi-modal demonstration system with a segmentation framework that improves event boundary detection in contact-rich tasks.
Findings
Achieves over 90% segmentation accuracy.
Multi-modal data improves segmentation performance.
Validates the practicality of TacUMI for complex tasks.
Abstract
Task decomposition is critical for understanding and learning complex long-horizon manipulation tasks. Especially for tasks involving rich physical interactions, relying solely on visual observations and robot proprioceptive information often fails to reveal the underlying event transitions. This raises the requirement for efficient collection of high-quality multi-modal data as well as robust segmentation method to decompose demonstrations into meaningful modules. Building on the idea of the handheld demonstration device Universal Manipulation Interface (UMI), we introduce TacUMI, a multi-modal data collection system that integrates additionally ViTac sensors, force-torque sensor, and pose tracker into a compact, robot-compatible gripper design, which enables synchronized acquisition of all these modalities during human demonstrations. We then propose a multi-modal segmentation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Social Robot Interaction and HRI
