Signs of Language: Embodied Sign Language Fingerspelling Acquisition from Demonstrations for Human-Robot Interaction
Federico Tavella, Aphrodite Galata, Angelo Cangelosi

TL;DR
This paper presents a method for robots to learn sign language fingerspelling from video demonstrations using deep vision and reinforcement learning, enabling accurate imitation of fine movements without extra data.
Contribution
It introduces a novel approach combining vision-based pose extraction and reinforcement learning for robotic fingerspelling acquisition from videos.
Findings
Successfully imitated six fingerspelled letters.
Demonstrated generalization across multiple tasks.
Achieved accurate reproduction of fine-grained movements.
Abstract
Learning fine-grained movements is a challenging topic in robotics, particularly in the context of robotic hands. One specific instance of this challenge is the acquisition of fingerspelling sign language in robots. In this paper, we propose an approach for learning dexterous motor imitation from video examples without additional information. To achieve this, we first build a URDF model of a robotic hand with a single actuator for each joint. We then leverage pre-trained deep vision models to extract the 3D pose of the hand from RGB videos. Next, using state-of-the-art reinforcement learning algorithms for motion imitation (namely, proximal policy optimization and soft actor-critic), we train a policy to reproduce the movement extracted from the demonstrations. We identify the optimal set of hyperparameters for imitation based on a reference motion. Finally, we demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Robot Manipulation and Learning
