Unsupervised Predictive Memory in a Goal-Directed Agent

Greg Wayne; Chia-Chun Hung; David Amos; Mehdi Mirza; Arun Ahuja,; Agnieszka Grabska-Barwinska; Jack Rae; Piotr Mirowski; Joel Z. Leibo; Adam; Santoro; Mevlana Gemici; Malcolm Reynolds; Tim Harley; Josh Abramson; Shakir; Mohamed; Danilo Rezende; David Saxton; Adam Cain; Chloe Hillier; David; Silver; Koray Kavukcuoglu; Matt Botvinick; Demis Hassabis; Timothy Lillicrap

arXiv:1803.10760·cs.LG·March 29, 2018·148 cites

Unsupervised Predictive Memory in a Goal-Directed Agent

Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja,, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z. Leibo, Adam, Santoro, Mevlana Gemici, Malcolm Reynolds, Tim Harley, Josh Abramson, Shakir, Mohamed, Danilo Rezende, David Saxton, Adam Cain

PDF

Open Access 1 Repo

TL;DR

This paper introduces MERLIN, a memory-augmented reinforcement learning model that effectively handles partial observability in complex 3D environments by using predictive modeling to guide memory formation.

Contribution

The paper presents MERLIN, a novel AI architecture that combines memory, reinforcement learning, and inference, enabling agents to solve complex, partially observable tasks without simplifying assumptions.

Findings

01

MERLIN outperforms traditional RL algorithms in partially observable 3D tasks.

02

Memory guided by predictive modeling improves long-term task performance.

03

The model operates effectively without assumptions on sensory input dimensionality.

Abstract

Animals execute goal-directed behaviours despite the limited range and scope of their sensors. To cope, they explore environments and store memories maintaining estimates of important information that is not presently available. Recently, progress has been made with artificial intelligence (AI) agents that learn to perform tasks from sensory input, even at a human level, by merging reinforcement learning (RL) algorithms with deep neural networks, and the excitement surrounding these results has led to the pursuit of related ideas as explanations of non-human animal learning. However, we demonstrate that contemporary RL algorithms struggle to solve simple tasks when enough information is concealed from the sensors of the agent, a property called "partial observability". An obvious requirement for handling partially observed tasks is access to extensive memory, but we show memory is not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yosider/merlin
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Neural dynamics and brain function · Explainable Artificial Intelligence (XAI)