Where is my hand? Deep hand segmentation for visual self-recognition in humanoid robots
Alexandre Almeida, Pedro Vicente, Alexandre Bernardino

TL;DR
This paper presents a CNN-based method for segmenting humanoid robot hands using synthetic data and domain randomization, enabling accurate self-recognition crucial for manipulation and interaction tasks.
Contribution
It introduces a low-data, domain-randomized training approach for Mask-RCNN to effectively segment robot hands with minimal real-world data.
Findings
Achieved 82% IoU on synthetic validation data.
Achieved 56.3% IoU on real test data.
Training required only 1000 images and 3 hours on a single GPU.
Abstract
The ability to distinguish between the self and the background is of paramount importance for robotic tasks. The particular case of hands, as the end effectors of a robotic system that more often enter into contact with other elements of the environment, must be perceived and tracked with precision to execute the intended tasks with dexterity and without colliding with obstacles. They are fundamental for several applications, from Human-Robot Interaction tasks to object manipulation. Modern humanoid robots are characterized by high number of degrees of freedom which makes their forward kinematics models very sensitive to uncertainty. Thus, resorting to vision sensing can be the only solution to endow these robots with a good perception of the self, being able to localize their body parts with precision. In this paper, we propose the use of a Convolution Neural Network (CNN) to segment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution
