# Sample-Efficient Model-Free Reinforcement Learning with Off-Policy   Critics

**Authors:** Denis Steckelmacher, H\'el\`ene Plisnier, Diederik M. Roijers, Ann, Now\'e

arXiv: 1903.04193 · 2019-06-13

## TL;DR

BDPI is a novel off-policy actor-critic algorithm that achieves high sample efficiency and stability in reinforcement learning by decoupling actor and critics, outperforming existing methods across various tasks.

## Contribution

Introduction of BDPI, a new model-free off-policy actor-critic algorithm with multiple critics and a decoupled actor, enhancing sample efficiency and robustness.

## Key findings

- BDPI outperforms Bootstrapped DQN, PPO, and ACKTR in sample efficiency.
- BDPI demonstrates high stability and robustness to hyper-parameters.
- BDPI is effective across discrete, continuous, and pixel-based tasks.

## Abstract

Value-based reinforcement-learning algorithms provide state-of-the-art results in model-free discrete-action settings, and tend to outperform actor-critic algorithms. We argue that actor-critic algorithms are limited by their need for an on-policy critic. We propose Bootstrapped Dual Policy Iteration (BDPI), a novel model-free reinforcement-learning algorithm for continuous states and discrete actions, with an actor and several off-policy critics. Off-policy critics are compatible with experience replay, ensuring high sample-efficiency, without the need for off-policy corrections. The actor, by slowly imitating the average greedy policy of the critics, leads to high-quality and state-specific exploration, which we compare to Thompson sampling. Because the actor and critics are fully decoupled, BDPI is remarkably stable, and unusually robust to its hyper-parameters. BDPI is significantly more sample-efficient than Bootstrapped DQN, PPO, and ACKTR, on discrete, continuous and pixel-based tasks. Source code: https://github.com/vub-ai-lab/bdpi.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.04193/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1903.04193/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/1903.04193/full.md

---
Source: https://tomesphere.com/paper/1903.04193