# Optimistic Planning by Regularized Dynamic Programming

**Authors:** Antoine Moulin, Gergely Neu

arXiv: 2302.14004 · 2023-06-16

## TL;DR

This paper introduces a regularized dynamic programming approach for optimistic planning in infinite-horizon discounted MDPs, enabling efficient learning of near-optimal policies with theoretical guarantees.

## Contribution

It presents a novel regularization technique that simplifies analysis and extends optimistic planning to linear function approximation in MDPs.

## Key findings

- Achieves near-optimal statistical guarantees in linear mixture MDPs.
- Provides a computationally efficient algorithm for policy learning.
- Recovers known guarantees in tabular MDPs.

## Abstract

We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This technique allows us to avoid contraction and monotonicity arguments typically required by existing analyses of approximate dynamic programming methods, and in particular to use approximate transition functions estimated via least-squares procedures in MDPs with linear function approximation. We use our method to recover known guarantees in tabular MDPs and to provide a computationally efficient algorithm for learning near-optimal policies in discounted linear mixture MDPs from a single stream of experience, and show it achieves near-optimal statistical guarantees.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.14004/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/2302.14004/full.md

---
Source: https://tomesphere.com/paper/2302.14004