On Improving Deep Reinforcement Learning for POMDPs
Pengfei Zhu, Xin Li, Pascal Poupart, Guanghui Miao

TL;DR
This paper introduces ADRQN, a novel deep RL architecture that combines action-observation pairs with LSTM to improve learning in partially observable environments, demonstrated on flickering Atari games.
Contribution
The paper proposes ADRQN, integrating action-specific encoding with recurrent neural networks to enhance deep RL performance in POMDPs, addressing a gap in existing methods.
Findings
ADRQN outperforms traditional DQNs in partially observable Atari games.
Action-observation pairing with LSTM improves latent state learning.
Enhanced deep RL performance in POMDPs demonstrated.
Abstract
Deep Reinforcement Learning (RL) recently emerged as one of the most competitive approaches for learning in sequential decision making problems with fully observable environments, e.g., computer Go. However, very little work has been done in deep RL to handle partially observable environments. We propose a new architecture called Action-specific Deep Recurrent Q-Network (ADRQN) to enhance learning performance in partially observable domains. Actions are encoded by a fully connected layer and coupled with a convolutional observation to form an action-observation pair. The time series of action-observation pairs are then integrated by an LSTM layer that learns latent states based on which a fully connected layer computes Q-values as in conventional Deep Q-Networks (DQNs). We demonstrate the effectiveness of our new architecture in several partially observable domains, including flickering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsElevator Systems and Control · Machine Learning and ELM · Optimization and Search Problems
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
