# Similarities between policy gradient methods (PGM) in Reinforcement   learning (RL) and supervised learning (SL)

**Authors:** Eric Benhamou

arXiv: 1904.06260 · 2019-05-03

## TL;DR

This paper reveals that policy gradient methods in reinforcement learning can be viewed as supervised learning problems by replacing true labels with discounted rewards, highlighting a fundamental connection between the two learning paradigms.

## Contribution

The paper provides a new proof linking policy gradient methods to supervised learning, emphasizing their relationship through reward functions and cross-entropy.

## Key findings

- Policy gradient methods can be reformulated as supervised learning.
- Interchanging labels and pseudo rewards affects learning outcomes.
- Modifying reward functions can establish further connections with supervised learning.

## Abstract

Reinforcement learning (RL) is about sequential decision making and is traditionally opposed to supervised learning (SL) and unsupervised learning (USL). In RL, given the current state, the agent makes a decision that may influence the next state as opposed to SL (and USL) where, the next state remains the same, regardless of the decisions taken, either in batch or online learning. Although this difference is fundamental between SL and RL, there are connections that have been overlooked. In particular, we prove in this paper that gradient policy method can be cast as a supervised learning problem where true label are replaced with discounted rewards. We provide a new proof of policy gradient methods (PGM) that emphasizes the tight link with the cross entropy and supervised learning. We provide a simple experiment where we interchange label and pseudo rewards. We conclude that other relationships with SL could be made if we modify the reward functions wisely.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.06260/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1904.06260/full.md

## References

19 references — full list in the complete paper: https://tomesphere.com/paper/1904.06260/full.md

---
Source: https://tomesphere.com/paper/1904.06260