Vision in Action: Learning Active Perception from Human Demonstrations

Haoyu Xiong; Xiaomeng Xu; Jimmy Wu; Yifan Hou; Jeannette Bohg; Shuran Song

arXiv:2506.15666·cs.RO·June 19, 2025

Vision in Action: Learning Active Perception from Human Demonstrations

Haoyu Xiong, Xiaomeng Xu, Jimmy Wu, Yifan Hou, Jeannette Bohg, Shuran Song

PDF

Open Access

TL;DR

This paper introduces ViA, an active perception system for bimanual robots that learns human-like perceptual strategies through VR teleoperation, enabling robust manipulation in complex tasks with visual occlusions.

Contribution

ViA is the first system to learn active perceptual strategies from human demonstrations using VR, integrating a robotic neck and shared observation space for improved robot perception.

Findings

01

ViA significantly outperforms baseline systems in complex manipulation tasks.

02

The VR interface effectively captures human perceptual strategies despite latency issues.

03

Robust visuomotor policies are learned for multi-stage tasks involving occlusions.

Abstract

We present Vision in Action (ViA), an active perception system for bimanual robot manipulation. ViA learns task-relevant active perceptual strategies (e.g., searching, tracking, and focusing) directly from human demonstrations. On the hardware side, ViA employs a simple yet effective 6-DoF robotic neck to enable flexible, human-like head movements. To capture human active perception strategies, we design a VR-based teleoperation interface that creates a shared observation space between the robot and the human operator. To mitigate VR motion sickness caused by latency in the robot's physical movements, the interface uses an intermediate 3D scene representation, enabling real-time view rendering on the operator side while asynchronously updating the scene with the robot's latest observations. Together, these design elements enable the learning of robust visuomotor policies for three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics