Logarithmic regret bounds for continuous-time average-reward Markov   decision processes

Xuefeng Gao; Xun Yu Zhou

arXiv:2205.11168·cs.LG·July 3, 2024·1 cites

Logarithmic regret bounds for continuous-time average-reward Markov decision processes

Xuefeng Gao, Xun Yu Zhou

PDF

Open Access

TL;DR

This paper establishes logarithmic regret bounds for reinforcement learning in continuous-time Markov decision processes, introducing a novel algorithm with proven finite-time performance guarantees in the average-reward setting.

Contribution

It provides the first instance-dependent regret lower bounds and a corresponding learning algorithm for continuous-time MDPs, extending RL theory beyond discrete-time models.

Findings

01

Regret lower bounds are logarithmic in the time horizon.

02

A new learning algorithm achieves logarithmic regret growth.

03

Analysis uses upper confidence bounds and stochastic comparison techniques.

Abstract

We consider reinforcement learning for continuous-time Markov decision processes (MDPs) in the infinite-horizon, average-reward setting. In contrast to discrete-time MDPs, a continuous-time process moves to a state and stays there for a random holding time after an action is taken. With unknown transition probabilities and rates of exponential holding times, we derive instance-dependent regret lower bounds that are logarithmic in the time horizon. Moreover, we design a learning algorithm and establish a finite-time regret bound that achieves the logarithmic growth rate. Our analysis builds upon upper confidence reinforcement learning, a delicate estimation of the mean holding times, and stochastic comparison of point processes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBehavioral Health and Interventions · Mental Health Research Topics · Decision-Making and Behavioral Economics