An Empirical Comparison of Neural Architectures for Reinforcement   Learning in Partially Observable Environments

Denis Steckelmacher; Peter Vrancx

arXiv:1512.05509·cs.NE·December 18, 2015·2 cites

An Empirical Comparison of Neural Architectures for Reinforcement Learning in Partially Observable Environments

Denis Steckelmacher, Peter Vrancx

PDF

Open Access

TL;DR

This paper empirically compares different neural network architectures for reinforcement learning in partially observable environments, finding that GRU outperforms LSTM and MUT1, and that advantage learning improves results.

Contribution

It provides a comparative analysis of RNN architectures for reinforcement learning, highlighting the superior performance of GRU and the benefits of advantage learning.

Findings

01

GRU outperforms LSTM and MUT1 in most tasks.

02

Advantage learning yields better policies.

03

GRU requires fewer training episodes and less CPU time.

Abstract

This paper explores the performance of fitted neural Q iteration for reinforcement learning in several partially observable environments, using three recurrent neural network architectures: Long Short-Term Memory, Gated Recurrent Unit and MUT1, a recurrent neural architecture evolved from a pool of several thousands candidate architectures. A variant of fitted Q iteration, based on Advantage values instead of Q values, is also explored. The results show that GRU performs significantly better than LSTM and MUT1 for most of the problems considered, requiring less training episodes and less CPU time before learning a very good policy. Advantage learning also tends to produce better results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Neural Networks and Applications · Data Stream Mining Techniques

MethodsSigmoid Activation · Tanh Activation · Gated Recurrent Unit · Long Short-Term Memory