Cross-Embodiment Dexterous Grasping with Reinforcement Learning

Haoqi Yuan; Bohan Zhou; Yuhui Fu; and Zongqing Lu

arXiv:2410.02479·cs.RO·October 4, 2024

Cross-Embodiment Dexterous Grasping with Reinforcement Learning

Haoqi Yuan, Bohan Zhou, Yuhui Fu, and Zongqing Lu

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a reinforcement learning approach for cross-embodiment dexterous grasping, enabling a universal policy that generalizes across different robotic hands using eigengrasps and simplified proprioception.

Contribution

It proposes a universal action space based on eigengrasps and a unified observation space, allowing a single policy to control diverse dexterous hands with high success and zero-shot generalization.

Findings

01

Achieved 80% grasp success rate across four robot embodiments.

02

Zero-shot generalization to two unseen robot hands.

03

Significant improvement in finetuning efficiency.

Abstract

Dexterous hands exhibit significant potential for complex real-world grasping tasks. While recent studies have primarily focused on learning policies for specific robotic hands, the development of a universal policy that controls diverse dexterous hands remains largely unexplored. In this work, we study the learning of cross-embodiment dexterous grasping policies using reinforcement learning (RL). Inspired by the capability of human hands to control various dexterous hands through teleoperation, we propose a universal action space based on the human hand's eigengrasps. The policy outputs eigengrasp actions that are then converted into specific joint actions for each robot hand through a retargeting mapping. We simplify the robot hand's proprioception to include only the positions of fingertips and the palm, offering a unified observation space across different robot hands. Our approach…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

- The authors conducted a comprehensive benchmark and ablation study. - The presentation of the paper is clear and well-structured.

Weaknesses

- Only LEAP Hand was tested in the real-world experiments. - Some failure cases should be included in the video. - Joint position of the arm is included for training policies, which might affect the generalizability to different arms. - More design choices should be analyzed regarding the impact on generalizability. See the questions below.

Reviewer 02Rating 6Confidence 4

Strengths

This paper presents a clear story on training policies with multiple embodiments. The paper is well-written and illustrates the motivation from human teleoperation well. The design of the action space (i.e., a linear combination of eigengrasp coefficients) is interesting. The idea of using a neural network to approximate the retargeting is also interesting. The authors also show that the policy can successfully transfer to the real world by training a student policy with vision as the input.

Weaknesses

From my understanding, the observation space contains four fingertip positions. It is not discussed how to deal with three or five fingers, or if the method is specifically designed for anthropomorphic hands. If this is an intrinsic assumption or limitation, it should be discussed in the main paper. The authors design several ablation experiments to show the importance of the proposed observation and action space. However, there is still one design missing: What if we train a policy that direct

Reviewer 03Rating 3Confidence 3

Strengths

**Relevance of the Problem** - The huge diversity of embodiments presents a significant challenge in embodied AI, making research on cross-embodiment learning highly relevant. **Method** - Leveraging pre-trained neural networks to approximate the optimization-based retargeting process across different embodiments is a intuitive approach to ensure efficient, parallelized training. **Presentation** - The writing is clear, and the logical flow of the paper effectively outlines studied problem and

Weaknesses

**Method and Evaluation** - The baselines used in the evaluations are ablations of the proposed method and do not appear to be particularly strong for this task. - Training on 4 embodiments and transferring to 2 novel ones is indeed quite limited, as acknowledged in the limitations section. However, attributing this limitation to 'restricted access to dexterous hand models' is not a compelling justification. Incorporating a simulation with a broader range of dexterous hand models and exploring h

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Evolutionary Algorithms and Applications · Reinforcement Learning in Robotics