Emergent Active Perception and Dexterity of Simulated Humanoids from Visual Reinforcement Learning

Zhengyi Luo; Chen Tessler; Toru Lin; Ye Yuan; Tairan He; Wenli Xiao; Yunrong Guo; Gal Chechik; Kris Kitani; Linxi Fan; Yuke Zhu

arXiv:2505.12278·cs.RO·May 20, 2025

Emergent Active Perception and Dexterity of Simulated Humanoids from Visual Reinforcement Learning

Zhengyi Luo, Chen Tessler, Toru Lin, Ye Yuan, Tairan He, Wenli Xiao, Yunrong Guo, Gal Chechik, Kris Kitani, Linxi Fan, Yuke Zhu

PDF

Open Access

TL;DR

This paper presents Perceptive Dexterous Control (PDC), a vision-based framework enabling simulated humanoids to perform complex household tasks through active perception and reinforcement learning, without relying on privileged state information.

Contribution

Introduction of PDC, a novel vision-driven control framework allowing simulated humanoids to perform multiple tasks using egocentric vision and reinforcement learning, with emergent active search behaviors.

Findings

01

PDC enables humanoids to perform object search, grasping, and manipulation tasks.

02

Reinforcement learning from scratch produces human-like active search behaviors.

03

The approach demonstrates the importance of perception-action loops in embodied AI.

Abstract

Human behavior is fundamentally shaped by visual perception -- our ability to interact with the world depends on actively gathering relevant information and adapting our movements accordingly. Behaviors like searching for objects, reaching, and hand-eye coordination naturally emerge from the structure of our sensory system. Inspired by these principles, we introduce Perceptive Dexterous Control (PDC), a framework for vision-driven dexterous whole-body control with simulated humanoids. PDC operates solely on egocentric vision for task specification, enabling object search, target placement, and skill selection through visual cues, without relying on privileged state information (e.g., 3D object positions and geometries). This perception-as-interface paradigm enables learning a single policy to perform multiple household tasks, including reaching, grasping, placing, and articulated object…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Motor Control and Adaptation · Social Robot Interaction and HRI

MethodsPrime Dilated Convolution