On Improving Deep Reinforcement Learning for POMDPs

Pengfei Zhu; Xin Li; Pascal Poupart; Guanghui Miao

arXiv:1804.06309·cs.LG·May 9, 2018

On Improving Deep Reinforcement Learning for POMDPs

Pengfei Zhu, Xin Li, Pascal Poupart, Guanghui Miao

PDF

TL;DR

This paper introduces ADRQN, a novel deep RL architecture that effectively handles partially observable environments by integrating action-observation pairs with an LSTM, improving learning performance in such domains.

Contribution

The paper proposes ADRQN, a new architecture combining action-observation encoding with LSTM to enhance deep RL in POMDPs, addressing a gap in existing methods.

Findings

01

ADRQN outperforms traditional DQNs in flickering Atari games.

02

The architecture effectively captures latent states in partially observable environments.

03

Experimental results show improved learning stability and accuracy.

Abstract

Deep Reinforcement Learning (RL) recently emerged as one of the most competitive approaches for learning in sequential decision making problems with fully observable environments, e.g., computer Go. However, very little work has been done in deep RL to handle partially observable environments. We propose a new architecture called Action-specific Deep Recurrent Q-Network (ADRQN) to enhance learning performance in partially observable domains. Actions are encoded by a fully connected layer and coupled with a convolutional observation to form an action-observation pair. The time series of action-observation pairs are then integrated by an LSTM layer that learns latent states based on which a fully connected layer computes Q-values as in conventional Deep Q-Networks (DQNs). We demonstrate the effectiveness of our new architecture in several partially observable domains, including flickering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory