On Improving Deep Reinforcement Learning for POMDPs

Pengfei Zhu; Xin Li; Pascal Poupart; Guanghui Miao

arXiv:1704.07978·cs.LG·May 25, 2018·81 cites

On Improving Deep Reinforcement Learning for POMDPs

Pengfei Zhu, Xin Li, Pascal Poupart, Guanghui Miao

PDF

Open Access 1 Repo

TL;DR

This paper introduces ADRQN, a novel deep RL architecture that combines action-observation pairs with LSTM to improve learning in partially observable environments, demonstrated on flickering Atari games.

Contribution

The paper proposes ADRQN, integrating action-specific encoding with recurrent neural networks to enhance deep RL performance in POMDPs, addressing a gap in existing methods.

Findings

01

ADRQN outperforms traditional DQNs in partially observable Atari games.

02

Action-observation pairing with LSTM improves latent state learning.

03

Enhanced deep RL performance in POMDPs demonstrated.

Abstract

Deep Reinforcement Learning (RL) recently emerged as one of the most competitive approaches for learning in sequential decision making problems with fully observable environments, e.g., computer Go. However, very little work has been done in deep RL to handle partially observable environments. We propose a new architecture called Action-specific Deep Recurrent Q-Network (ADRQN) to enhance learning performance in partially observable domains. Actions are encoded by a fully connected layer and coupled with a convolutional observation to form an action-observation pair. The time series of action-observation pairs are then integrated by an LSTM layer that learns latent states based on which a fully connected layer computes Q-values as in conventional Deep Q-Networks (DQNs). We demonstrate the effectiveness of our new architecture in several partially observable domains, including flickering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bit1029public/ADRQN
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElevator Systems and Control · Machine Learning and ELM · Optimization and Search Problems

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory