MimicFunc: Imitating Tool Manipulation from a Single Human Video via Functional Correspondence

Chao Tang; Anxing Xiao; Yuhong Deng; Tianrun Hu; Wenlong Dong; Hanbo Zhang; David Hsu; Hong Zhang

arXiv:2508.13534·cs.RO·August 20, 2025

MimicFunc: Imitating Tool Manipulation from a Single Human Video via Functional Correspondence

Chao Tang, Anxing Xiao, Yuhong Deng, Tianrun Hu, Wenlong Dong, Hanbo Zhang, David Hsu, Hong Zhang

PDF

TL;DR

MimicFunc enables robots to learn tool manipulation skills from a single human video by establishing functional correspondences, allowing generalization to new tools and reducing the need for extensive teleoperation data.

Contribution

The paper introduces MimicFunc, a novel framework that uses a function-centric local coordinate frame to achieve one-shot imitation of tool manipulation from human videos.

Findings

01

Successfully generalizes manipulation skills to novel tools.

02

Enables training visuomotor policies without teleoperation data.

03

Demonstrates effective imitation from a single RGB-D video.

Abstract

Imitating tool manipulation from human videos offers an intuitive approach to teaching robots, while also providing a promising and scalable alternative to labor-intensive teleoperation data collection for visuomotor policy learning. While humans can mimic tool manipulation behavior by observing others perform a task just once and effortlessly transfer the skill to diverse tools for functionally equivalent tasks, current robots struggle to achieve this level of generalization. A key challenge lies in establishing function-level correspondences, considering the significant geometric variations among functionally similar tools, referred to as intra-function variations. To address this challenge, we propose MimicFunc, a framework that establishes functional correspondences with function frame, a function-centric local coordinate frame constructed with keypoint-based abstraction, for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.