# Practical Open-Loop Optimistic Planning

**Authors:** Edouard Leurent, Odalric-Ambrym Maillard

arXiv: 1904.04700 · 2019-04-10

## TL;DR

This paper introduces KLOLOP, an improved open-loop optimistic planning algorithm for Markov Decision Processes that offers better practical performance and efficiency while maintaining theoretical guarantees.

## Contribution

It proposes KLOLOP, a modified algorithm with tighter confidence bounds, and an efficient implementation to enhance practical performance and computational efficiency.

## Key findings

- KLOLOP outperforms OLOP in numerical experiments.
- KLOLOP maintains the theoretical sample complexity bound.
- The implementation significantly reduces time complexity.

## Abstract

We consider the problem of online planning in a Markov Decision Process when given only access to a generative model, restricted to open-loop policies - i.e. sequences of actions - and under budget constraint. In this setting, the Open-Loop Optimistic Planning (OLOP) algorithm enjoys good theoretical guarantees but is overly conservative in practice, as we show in numerical experiments. We propose a modified version of the algorithm with tighter upper-confidence bounds, KLOLOP, that leads to better practical performances while retaining the sample complexity bound. Finally, we propose an efficient implementation that significantly improves the time complexity of both algorithms.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.04700/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/1904.04700/full.md

## References

15 references — full list in the complete paper: https://tomesphere.com/paper/1904.04700/full.md

---
Source: https://tomesphere.com/paper/1904.04700