Near-continuous time Reinforcement Learning for continuous state-action   spaces

Lorenzo Croissant (CEREMADE); Marc Abeille; Bruno Bouchard (CEREMADE)

arXiv:2309.02815·cs.AI·September 7, 2023

Near-continuous time Reinforcement Learning for continuous state-action spaces

Lorenzo Croissant (CEREMADE), Marc Abeille, Bruno Bouchard (CEREMADE)

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning framework for continuous-time control of unknown dynamical systems, using a Poisson clock model to unify discrete and continuous interactions, and achieves near-optimal regret bounds.

Contribution

It models high-frequency interactions with a Poisson clock, extends RL to continuous time and state spaces, and proposes an approximate planning method with provable regret guarantees.

Findings

01

Regret of order (\u03b5^{1/2} T + \u221a{T})

02

Approximate planning via diffusive limit is effective

03

Regret approaches T in near-continuous time limit

Abstract

We consider the Reinforcement Learning problem of controlling an unknown dynamical system to maximise the long-term average reward along a single trajectory. Most of the literature considers system interactions that occur in discrete time and discrete state-action spaces. Although this standpoint is suitable for games, it is often inadequate for mechanical or digital systems in which interactions occur at a high frequency, if not in continuous time, and whose state spaces are large if not inherently continuous. Perhaps the only exception is the Linear Quadratic framework for which results exist both in discrete and continuous time. However, its ability to handle continuous states comes with the drawback of a rigid dynamic and reward structure. This work aims to overcome these shortcomings by modelling interaction times with a Poisson clock of frequency $ε^{- 1}$ , which captures…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Receptor Mechanisms and Signaling · Advanced Bandit Algorithms Research