Vision-Based Hand Shadowing for Robotic Manipulation via Inverse Kinematics
Hendrik Chiche, Antoine Jamme, Trevor Rigoberto Martinez, Gabriel Gomes

TL;DR
This paper introduces a vision-based inverse kinematics pipeline for hand shadowing to enable teleoperation of robotic manipulators using a single egocentric RGB-D camera, with evaluation on structured and real-world tasks.
Contribution
The work presents a novel offline retargeting pipeline combining hand landmark detection, 3D deprojection, and IK solving for robotic control, including a gripper controller and simulation validation.
Findings
Achieved 86.7% success rate on structured pick-and-place benchmark.
Reported mean IK position error of 36.4 mm and significant jerk reduction with smoothing.
Improved hand detection rate by 8% using WiLoR over MediaPipe.
Abstract
Teleoperation of low-cost robotic manipulators remains challenging due to the difficulty of retargeting human hand motion to robot joint commands. We present an offline hand-shadowing inverse-kinematics (IK) retargeting pipeline driven by a single egocentric RGB-D camera mounted on 3D-printed glasses. The pipeline detects 21 hand landmarks per hand using MediaPipe Hands, deprojects them into 3D via depth sensing, transforms them into the robot coordinate frame, and solves a damped-least-squares IK problem to produce joint commands for the SO-ARM101 robot (5 arm + 1 gripper joints). A gripper controller maps thumb-index finger geometry to grasp aperture with a multi-level fallback hierarchy. Actions are previewed in a physics simulation before replay on the physical robot. We evaluate the pipeline on a structured pick-and-place benchmark (5-tile grid, 10 grasps per tile, 3 independent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
