AGIL: Learning Attention from Human for Visuomotor Tasks

Ruohan Zhang; Zhuode Liu; Luxin Zhang; Jake A. Whritner; Karl S.; Muller; Mary M. Hayhoe; Dana H. Ballard

arXiv:1806.03960·cs.CV·June 12, 2018·6 cites

AGIL: Learning Attention from Human for Visuomotor Tasks

Ruohan Zhang, Zhuode Liu, Luxin Zhang, Jake A. Whritner, Karl S., Muller, Mary M. Hayhoe, Dana H. Ballard

PDF

Open Access

TL;DR

This paper introduces AGIL, a framework that leverages human gaze data to guide imitation learning in visuomotor tasks, significantly enhancing agent performance by integrating human visual attention into policy models.

Contribution

The paper presents a novel approach that uses human gaze prediction to improve imitation learning for visuomotor tasks, demonstrated through Atari game experiments.

Findings

01

Gaze prediction model achieves high accuracy in inferring human visual attention.

02

Incorporating gaze-based attention improves action prediction accuracy.

03

Agents with gaze-guided attention outperform baseline models in task performance.

Abstract

When intelligent agents learn visuomotor behaviors from human demonstrations, they may benefit from knowing where the human is allocating visual attention, which can be inferred from their gaze. A wealth of information regarding intelligent decision making is conveyed by human gaze allocation; hence, exploiting such information has the potential to improve the agents' performance. With this motivation, we propose the AGIL (Attention Guided Imitation Learning) framework. We collect high-quality human action and gaze data while playing Atari games in a carefully controlled experimental setting. Using these data, we first train a deep neural network that can predict human gaze positions and visual attention with high accuracy (the gaze network) and then train another network to predict human actions (the policy network). Incorporating the learned attention model from the gaze network into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Gaze Tracking and Assistive Technology · Human Pose and Action Recognition