# TD-Regularized Actor-Critic Methods

**Authors:** Simone Parisi, Voot Tangkaratt, Jan Peters, and Mohammad Emtiyaz Khan

arXiv: 1812.08288 · 2019-02-26

## TL;DR

This paper introduces TD-regularization for actor-critic methods, penalizing critic errors to enhance stability and performance in reinforcement learning, demonstrated through benchmark evaluations.

## Contribution

It proposes a simple, effective regularization technique that improves stability of actor-critic algorithms by penalizing the critic's TD error during training.

## Key findings

- Enhanced stability in actor-critic training.
- Improved performance on standard benchmarks.
- Plug-and-play applicability of the method.

## Abstract

Actor-critic methods can achieve incredible performance on difficult reinforcement learning problems, but they are also prone to instability. This is partly due to the interaction between the actor and critic during learning, e.g., an inaccurate step taken by one of them might adversely affect the other and destabilize the learning. To avoid such issues, we propose to regularize the learning objective of the actor by penalizing the temporal difference (TD) error of the critic. This improves stability by avoiding large steps in the actor update whenever the critic is highly inaccurate. The resulting method, which we call the TD-regularized actor-critic method, is a simple plug-and-play approach to improve stability and overall performance of the actor-critic methods. Evaluations on standard benchmarks confirm this.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.08288/full.md

## Figures

85 figures with captions in the complete paper: https://tomesphere.com/paper/1812.08288/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/1812.08288/full.md

---
Source: https://tomesphere.com/paper/1812.08288