Grounded Task Axes: Zero-Shot Semantic Skill Generalization via Task-Axis Controllers and Visual Foundation Models
M. Yunus Seker, Shobhit Aggarwal, Oliver Kroemer

TL;DR
This paper introduces a zero-shot skill transfer method for robot manipulation that decomposes skills into grounded task-axis controllers linked to object keypoints, enabling generalization to new objects using foundation models.
Contribution
It presents a novel approach combining grounded task-axis controllers with foundation models for zero-shot skill transfer in robot manipulation.
Findings
Successful zero-shot transfer on real robots for screwing, pouring, and scraping.
Robust and versatile controller transfer demonstrated across diverse tasks.
Framework leverages foundation models for semantic keypoint detection.
Abstract
Transferring skills between different objects remains one of the core challenges of open-world robot manipulation. Generalization needs to take into account the high-level structural differences between distinct objects while still maintaining similar low-level interaction control. In this paper, we propose an example-based zero-shot approach to skill transfer. Rather than treating skills as atomic, we decompose skills into a prioritized list of grounded task-axis (GTA) controllers. Each GTAC defines an adaptable controller, such as a position or force controller, along an axis. Importantly, the GTACs are grounded in object key points and axes, e.g., the relative position of a screw head or the axis of its shaft. Zero-shot transfer is thus achieved by finding semantically-similar grounding features on novel target objects. We achieve this example-based grounding of the skills through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Motor Control and Adaptation · Human Pose and Action Recognition
