# Revisiting stochastic off-policy action-value gradients

**Authors:** Yemi Okesanjo, Victor Kofia

arXiv: 1703.02102 · 2017-03-14

## TL;DR

This paper explores off-policy stochastic actor-critic methods focusing on action-value gradients, proposing an incremental approach for policy improvement that complements existing natural gradient techniques.

## Contribution

It introduces an off-policy stochastic action-value gradient method and an incremental approach for policy updates, expanding the theoretical framework of actor-critic algorithms.

## Key findings

- Analysis of off-policy stochastic action-value gradients
- Proposal of an incremental policy gradient method
- Discussion on relation to natural gradient approaches

## Abstract

Off-policy stochastic actor-critic methods rely on approximating the stochastic policy gradient in order to derive an optimal policy. One may also derive the optimal policy by approximating the action-value gradient. The use of action-value gradients is desirable as policy improvement occurs along the direction of steepest ascent. This has been studied extensively within the context of natural gradient actor-critic algorithms and more recently within the context of deterministic policy gradients. In this paper we briefly discuss the off-policy stochastic counterpart to deterministic action-value gradients, as well as an incremental approach for following the policy gradient in lieu of the natural gradient.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.02102/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1703.02102/full.md

## References

13 references — full list in the complete paper: https://tomesphere.com/paper/1703.02102/full.md

---
Source: https://tomesphere.com/paper/1703.02102