Sample Efficient Actor-Critic with Experience Replay

Ziyu Wang; Victor Bapst; Nicolas Heess; Volodymyr Mnih; Remi Munos,; Koray Kavukcuoglu; Nando de Freitas

arXiv:1611.01224·cs.LG·July 11, 2017·222 cites

Sample Efficient Actor-Critic with Experience Replay

Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos,, Koray Kavukcuoglu, Nando de Freitas

PDF

Open Access 5 Repos

TL;DR

This paper introduces a novel actor-critic reinforcement learning algorithm that combines experience replay with innovative techniques like truncated importance sampling and trust region optimization, achieving high sample efficiency and stability in complex environments.

Contribution

It presents new methods such as truncated importance sampling with bias correction and a stochastic dueling network architecture for improved sample efficiency and stability.

Findings

01

Achieves remarkable performance on Atari and continuous control tasks.

02

Demonstrates improved sample efficiency over existing methods.

03

Provides a stable training framework for deep reinforcement learning.

Abstract

This paper presents an actor-critic deep reinforcement learning agent with experience replay that is stable, sample efficient, and performs remarkably well on challenging environments, including the discrete 57-game Atari domain and several continuous control problems. To achieve this, the paper introduces several innovations, including truncated importance sampling with bias correction, stochastic dueling network architectures, and a new trust region policy optimization method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning

MethodsExperience Replay · Retrace · Trust Region Policy Optimization · Entropy Regularization · Double Q-learning · Dense Connections · *Communicated@Fast*How Do I Communicate to Expedia? · Softmax · Convolution · Stochastic Dueling Network