# Variance Reduction in Actor Critic Methods (ACM)

**Authors:** Eric Benhamou

arXiv: 1907.09765 · 2019-07-24

## TL;DR

This paper provides a theoretical analysis of Actor Critic Methods, showing they are control variate estimators and introducing a new variance-reduced formulation for Advantage Actor Critic methods.

## Contribution

It proves the optimality of Q and A2C methods as control variate estimators using the projection theorem, and derives a new, lower-variance A2C formulation.

## Key findings

- Q and A2C are optimal control variate estimators in L^2 norm.
- The new A2C formulation has lower variance than traditional methods.
- Theoretical justification for the strong performance of A2C methods.

## Abstract

After presenting Actor Critic Methods (ACM), we show ACM are control variate estimators. Using the projection theorem, we prove that the Q and Advantage Actor Critic (A2C) methods are optimal in the sense of the $L^2$ norm for the control variate estimators spanned by functions conditioned by the current state and action. This straightforward application of Pythagoras theorem provides a theoretical justification of the strong performance of QAC and AAC most often referred to as A2C methods in deep policy gradient methods. This enables us to derive a new formulation for Advantage Actor Critic methods that has lower variance and improves the traditional A2C method.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.09765/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1907.09765/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/1907.09765/full.md

---
Source: https://tomesphere.com/paper/1907.09765