# Exploiting the Sign of the Advantage Function to Learn Deterministic   Policies in Continuous Domains

**Authors:** Matthieu Zimmer, Paul Weng

arXiv: 1906.04556 · 2021-02-24

## TL;DR

This paper provides a theoretical foundation for a policy update method in continuous domains, extending it with a new trust region algorithm and demonstrating superior performance over existing methods in control tasks.

## Contribution

It offers a theoretical explanation for an alternative policy update, introduces Penalized NFAC, and empirically shows its effectiveness in classic control problems.

## Key findings

- PeNFAC outperforms state-of-the-art algorithms
- Theoretical justification for the policy update is established
- Extended approach improves learning in continuous domains

## Abstract

In the context of learning deterministic policies in continuous domains, we revisit an approach, which was first proposed in Continuous Actor Critic Learning Automaton (CACLA) and later extended in Neural Fitted Actor Critic (NFAC). This approach is based on a policy update different from that of deterministic policy gradient (DPG). Previous work has observed its excellent performance empirically, but a theoretical justification is lacking. To fill this gap, we provide a theoretical explanation to motivate this unorthodox policy update by relating it to another update and making explicit the objective function of the latter. We furthermore discuss in depth the properties of these updates to get a deeper understanding of the overall approach. In addition, we extend it and propose a new trust region algorithm, Penalized NFAC (PeNFAC). Finally, we experimentally demonstrate in several classic control problems that it surpasses the state-of-the-art algorithms to learn deterministic policies.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.04556/full.md

## Figures

23 figures with captions in the complete paper: https://tomesphere.com/paper/1906.04556/full.md

## References

19 references — full list in the complete paper: https://tomesphere.com/paper/1906.04556/full.md

---
Source: https://tomesphere.com/paper/1906.04556