Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations

Irmak Guzey; Haozhi Qi; Julen Urain; Changhao Wang; Jessica Yin; Krishna Bodduluri; Mike Lambeta; Lerrel Pinto; Akshara Rai; Jitendra Malik; Tingfan Wu; Akash Sharma; Homanga Bharadhwaj

arXiv:2511.16661·cs.RO·November 21, 2025

Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations

Irmak Guzey, Haozhi Qi, Julen Urain, Changhao Wang, Jessica Yin, Krishna Bodduluri, Mike Lambeta, Lerrel Pinto, Akshara Rai, Jitendra Malik, Tingfan Wu, Akash Sharma, Homanga Bharadhwaj

PDF

Open Access

TL;DR

This paper introduces AINA, a framework that enables learning multi-fingered robot manipulation policies directly from in-the-wild human demonstrations using lightweight smart glasses, reducing the need for robot-specific data collection.

Contribution

The paper presents AINA, a novel approach that leverages portable smart glasses to collect human manipulation data for training robot policies without robot data or simulation.

Findings

01

Robust multi-fingered policies learned from human demonstrations.

02

Effective transfer of policies to real robots across diverse tasks.

03

Superior performance compared to prior human-to-robot learning methods.

Abstract

Learning multi-fingered robot policies from humans performing daily tasks in natural environments has long been a grand goal in the robotics community. Achieving this would mark significant progress toward generalizable robot manipulation in human environments, as it would reduce the reliance on labor-intensive robot data collection. Despite substantial efforts, progress toward this goal has been bottle-necked by the embodiment gap between humans and robots, as well as by difficulties in extracting relevant contextual and motion cues that enable learning of autonomous policies from in-the-wild human videos. We claim that with simple yet sufficiently powerful hardware for obtaining human data and our proposed framework AINA, we are now one significant step closer to achieving this dream. AINA enables learning multi-fingered policies from data collected by anyone, anywhere, and in any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Social Robot Interaction and HRI