Learning to predict where to look in interactive environments using deep   recurrent q-learning

Sajad Mousavi; Michael Schukat; Enda Howley; Ali Borji; Nasser; Mozayani

arXiv:1612.05753·cs.CV·February 21, 2017·26 cites

Learning to predict where to look in interactive environments using deep recurrent q-learning

Sajad Mousavi, Michael Schukat, Enda Howley, Ali Borji, Nasser, Mozayani

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning approach with a soft attention mechanism to predict where to look in interactive environments, outperforming traditional bottom-up saliency models in complex tasks like Atari games.

Contribution

It combines deep Q-learning with a soft attention mechanism to improve prediction of task-relevant fixation points in interactive environments.

Findings

01

Outperforms bottom-up saliency models in predicting fixation locations.

02

Effective in complex interactive tasks like Atari games.

03

Demonstrates improved focus on relevant visual input.

Abstract

Bottom-Up (BU) saliency models do not perform well in complex interactive environments where humans are actively engaged in tasks (e.g., sandwich making and playing the video games). In this paper, we leverage Reinforcement Learning (RL) to highlight task-relevant locations of input frames. We propose a soft attention mechanism combined with the Deep Q-Network (DQN) model to teach an RL agent how to play a game and where to look by focusing on the most pertinent parts of its visual input. Our evaluations on several Atari 2600 games show that the soft attention based model could predict fixation locations significantly better than bottom-up models such as Itti-Kochs saliency and Graph-Based Visual Saliency (GBVS) models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Image and Video Quality Assessment · Virtual Reality Applications and Impacts