Sample Efficient Actor-Critic with Experience Replay
Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos,, Koray Kavukcuoglu, Nando de Freitas

TL;DR
This paper introduces a novel actor-critic reinforcement learning algorithm that combines experience replay with innovative techniques like truncated importance sampling and trust region optimization, achieving high sample efficiency and stability in complex environments.
Contribution
It presents new methods such as truncated importance sampling with bias correction and a stochastic dueling network architecture for improved sample efficiency and stability.
Findings
Achieves remarkable performance on Atari and continuous control tasks.
Demonstrates improved sample efficiency over existing methods.
Provides a stable training framework for deep reinforcement learning.
Abstract
This paper presents an actor-critic deep reinforcement learning agent with experience replay that is stable, sample efficient, and performs remarkably well on challenging environments, including the discrete 57-game Atari domain and several continuous control problems. To achieve this, the paper introduces several innovations, including truncated importance sampling with bias correction, stochastic dueling network architectures, and a new trust region policy optimization method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning
MethodsExperience Replay · Retrace · Trust Region Policy Optimization · Entropy Regularization · Double Q-learning · Dense Connections · *Communicated@Fast*How Do I Communicate to Expedia? · Softmax · Convolution · Stochastic Dueling Network
