Learning Actions and Control of Focus of Attention with a Log-Polar-like Sensor
Robin G\"oransson, Volker Krueger

TL;DR
This paper demonstrates that using log-polar like image data with gaze control and deep reinforcement learning can significantly reduce image data size without sacrificing performance in Atari games.
Contribution
It introduces a novel approach combining log-polar image representation with deep RL for gaze control, reducing data size while maintaining performance.
Findings
Reduced image data by a factor of 5 without performance loss
Extended A3C with LSTM for gaze control in Atari games
Effective use of log-polar images for attention-focused processing
Abstract
With the long-term goal of reducing the image processing time on an autonomous mobile robot in mind we explore in this paper the use of log-polar like image data with gaze control. The gaze control is not done on the Cartesian image but on the log-polar like image data. For this we start out from the classic deep reinforcement learning approach for Atari games. We extend an A3C deep RL approach with an LSTM network, and we learn the policy for playing three Atari games and a policy for gaze control. While the Atari games already use low-resolution images of 80 by 80 pixels, we are able to further reduce the amount of image pixels by a factor of 5 without losing any gaming performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaze Tracking and Assistive Technology · Visual Attention and Saliency Detection
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Convolution · Softmax · Dense Connections · Entropy Regularization · A3C
