# A Tractable Algorithm For Finite-Horizon Continuous Reinforcement   Learning

**Authors:** Phanideep Gampa, Sairam Satwik Kondamudi, Lakshmanan Kailasam

arXiv: 1906.11245 · 2019-08-05

## TL;DR

This paper introduces a tractable optimistic value iteration algorithm for finite-horizon continuous reinforcement learning, establishes regret lower bounds, analyzes discretization errors, and validates findings through experiments.

## Contribution

It presents a new algorithm for finite-horizon continuous RL, derives regret lower bounds, and analyzes discretization errors under H"{o}lder continuity assumptions.

## Key findings

- Proposed a tractable optimistic value iteration algorithm.
- Established a regret lower bound of (T^{2/3}) for discretized state spaces.
- Analyzed discretization error bounds under H"{o}lder continuity.

## Abstract

We consider the finite horizon continuous reinforcement learning problem. Our contribution is three-fold. First,we give a tractable algorithm based on optimistic value iteration for the problem. Next,we give a lower bound on regret of order $\Omega(T^{2/3})$ for any algorithm discretizes the state space, improving the previous regret bound of $\Omega(T^{1/2})$ of Ortner and Ryabko \cite{contrl} for the same problem. Next,under the assumption that the rewards and transitions are H\"{o}lder Continuous we show that the upper bound on the discretization error is $const.Ln^{-\alpha}T$. Finally,we give some simple experiments to validate our propositions.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.11245/full.md

## Figures

18 figures with captions in the complete paper: https://tomesphere.com/paper/1906.11245/full.md

## References

13 references — full list in the complete paper: https://tomesphere.com/paper/1906.11245/full.md

---
Source: https://tomesphere.com/paper/1906.11245