AGIL: Learning Attention from Human for Visuomotor Tasks
Ruohan Zhang, Zhuode Liu, Luxin Zhang, Jake A. Whritner, Karl S., Muller, Mary M. Hayhoe, Dana H. Ballard

TL;DR
This paper introduces AGIL, a framework that leverages human gaze data to guide imitation learning in visuomotor tasks, significantly enhancing agent performance by integrating human visual attention into policy models.
Contribution
The paper presents a novel approach that uses human gaze prediction to improve imitation learning for visuomotor tasks, demonstrated through Atari game experiments.
Findings
Gaze prediction model achieves high accuracy in inferring human visual attention.
Incorporating gaze-based attention improves action prediction accuracy.
Agents with gaze-guided attention outperform baseline models in task performance.
Abstract
When intelligent agents learn visuomotor behaviors from human demonstrations, they may benefit from knowing where the human is allocating visual attention, which can be inferred from their gaze. A wealth of information regarding intelligent decision making is conveyed by human gaze allocation; hence, exploiting such information has the potential to improve the agents' performance. With this motivation, we propose the AGIL (Attention Guided Imitation Learning) framework. We collect high-quality human action and gaze data while playing Atari games in a carefully controlled experimental setting. Using these data, we first train a deep neural network that can predict human gaze positions and visual attention with high accuracy (the gaze network) and then train another network to predict human actions (the policy network). Incorporating the learned attention model from the gaze network into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Gaze Tracking and Assistive Technology · Human Pose and Action Recognition
