TacUMI: A Multi-Modal Universal Manipulation Interface for Contact-Rich Tasks

Tailai Cheng; Kejia Chen; Lingyun Chen; Liding Zhang; Yue Zhang; Yao Ling; Mahdi Hamad; Zhenshan Bing; Fan Wu; Karan Sharma; Alois Knoll

arXiv:2601.14550·cs.RO·January 22, 2026

TacUMI: A Multi-Modal Universal Manipulation Interface for Contact-Rich Tasks

Tailai Cheng, Kejia Chen, Lingyun Chen, Liding Zhang, Yue Zhang, Yao Ling, Mahdi Hamad, Zhenshan Bing, Fan Wu, Karan Sharma, Alois Knoll

PDF

Open Access

TL;DR

TacUMI is a multi-modal data collection and segmentation system that enhances understanding of complex contact-rich manipulation tasks by integrating various sensors and leveraging temporal models for accurate event boundary detection.

Contribution

The paper introduces TacUMI, a novel multi-modal demonstration system with a segmentation framework that improves event boundary detection in contact-rich tasks.

Findings

01

Achieves over 90% segmentation accuracy.

02

Multi-modal data improves segmentation performance.

03

Validates the practicality of TacUMI for complex tasks.

Abstract

Task decomposition is critical for understanding and learning complex long-horizon manipulation tasks. Especially for tasks involving rich physical interactions, relying solely on visual observations and robot proprioceptive information often fails to reveal the underlying event transitions. This raises the requirement for efficient collection of high-quality multi-modal data as well as robust segmentation method to decompose demonstrations into meaningful modules. Building on the idea of the handheld demonstration device Universal Manipulation Interface (UMI), we introduce TacUMI, a multi-modal data collection system that integrates additionally ViTac sensors, force-torque sensor, and pose tracker into a compact, robot-compatible gripper design, which enables synchronized acquisition of all these modalities during human demonstrations. We then propose a multi-modal segmentation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Social Robot Interaction and HRI