Reinforcement Learning in a Birth and Death Process: Breaking the   Dependence on the State Space

Jonatha Anselmi (POLARIS; LIG); Bruno Gaujal (POLARIS; LIG),; Louis-S\'ebastien Rebuffi (POLARIS; LIG; UGA)

arXiv:2302.10667·cs.LG·February 22, 2023

Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State Space

Jonatha Anselmi (POLARIS, LIG), Bruno Gaujal (POLARIS, LIG),, Louis-S\'ebastien Rebuffi (POLARIS, LIG, UGA)

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that in certain structured MDPs, reinforcement learning regret bounds can be made independent of the state space size by exploiting the problem's structure, challenging traditional complexity assumptions.

Contribution

The authors show that for MDPs with a birth and death structure, the regret bound of a modified UCRL2 algorithm is independent of the number of states, breaking traditional dependence on the diameter.

Findings

01

Regret bound is (\, ext{E}_2 ext{A}T) with ext{E}_2 ext{A} bounded independently of states.

02

Traditional bounds suggest inefficiency due to large diameter; this work overcomes that.

03

The approach relies on analyzing non-uniform state visitations in structured MDPs.

Abstract

In this paper, we revisit the regret of undiscounted reinforcement learning in MDPs with a birth and death structure. Specifically, we consider a controlled queue with impatient jobs and the main objective is to optimize a trade-off between energy consumption and user-perceived performance. Within this setting, the \emph{diameter} $D$ of the MDP is $Ω (S^{S})$ , where $S$ is the number of states. Therefore, the existing lower and upper bounds on the regret at time $T$ , of order $O (D S A T)$ for MDPs with $S$ states and $A$ actions, may suggest that reinforcement learning is inefficient here. In our main result however, we exploit the structure of our MDPs to show that the regret of a slightly-tweaked version of the classical learning algorithm {\sc Ucrl2} is in fact upper bounded by $\tilde{O} (E_{2} A T)$ where $E_{2}$ is related to the weighted second moment of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State Space· slideslive

Taxonomy

TopicsSmart Grid Energy Management · Age of Information Optimization · Advanced Bandit Algorithms Research