# Classical Policy Gradient: Preserving Bellman's Principle of Optimality

**Authors:** Philip S. Thomas, Scott M. Jordan, Yash Chandak, Chris Nota, James, Kostas

arXiv: 1906.03063 · 2019-06-10

## TL;DR

This paper introduces a new objective function for finite-horizon episodic Markov decision processes that aligns more closely with Bellman's principle of optimality, along with its gradient expression.

## Contribution

It presents a novel objective function and gradient formulation that improve policy gradient methods in finite-horizon MDPs.

## Key findings

- New objective function better captures Bellman's principle
- Derived explicit gradient expression for the new objective
- Potential improvements in policy optimization accuracy

## Abstract

We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.03063/full.md

## References

7 references — full list in the complete paper: https://tomesphere.com/paper/1906.03063/full.md

---
Source: https://tomesphere.com/paper/1906.03063