# Compatible Natural Gradient Policy Search

**Authors:** Joni Pajarinen, Hong Linh Thai, Riad Akrour, Jan Peters, Gerhard, Neumann

arXiv: 1902.02823 · 2019-02-11

## TL;DR

This paper introduces COPOS, a new policy search method that controls entropy reduction in natural gradient updates, leading to improved performance in continuous and discrete control tasks.

## Contribution

It establishes the equivalence of natural gradient and trust region methods with exponential policies and proposes COPOS to effectively manage entropy loss during policy updates.

## Key findings

- COPOS achieves state-of-the-art results in continuous control tasks.
- COPOS performs well in discrete partially observable tasks.
- The paper clarifies the relationship between natural gradients and trust region methods.

## Abstract

Trust-region methods have yielded state-of-the-art results in policy search. A common approach is to use KL-divergence to bound the region of trust resulting in a natural gradient policy update. We show that the natural gradient and trust region optimization are equivalent if we use the natural parameterization of a standard exponential policy distribution in combination with compatible value function approximation. Moreover, we show that standard natural gradient updates may reduce the entropy of the policy according to a wrong schedule leading to premature convergence. To control entropy reduction we introduce a new policy search method called compatible policy search (COPOS) which bounds entropy loss. The experimental results show that COPOS yields state-of-the-art results in challenging continuous control tasks and in discrete partially observable tasks.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.02823/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/1902.02823/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/1902.02823/full.md

---
Source: https://tomesphere.com/paper/1902.02823