Reinforcement Learning and Regret Bounds for Admission Control

Lucas Weber; Ana Bu\v{s}i\'c; Jiamin Zhu

arXiv:2406.04766·cs.LG·June 10, 2024

Reinforcement Learning and Regret Bounds for Admission Control

Lucas Weber, Ana Bu\v{s}i\'c, Jiamin Zhu

PDF

Open Access 1 Repo

TL;DR

This paper analyzes the regret bounds for reinforcement learning in admission control problems, proposing an algorithm that leverages problem structure to achieve lower regret bounds than general cases, especially in queueing systems.

Contribution

It introduces a structure-aware reinforcement learning algorithm for admission control in queueing systems, improving regret bounds by exploiting problem-specific features.

Findings

01

Regret bounds are lower in structured queueing problems compared to general MDPs.

02

The proposed algorithm achieves an expected regret of O(S log T + √(mT log T)) in finite server cases.

03

In infinite server cases, the regret dependence on buffer size S vanishes.

Abstract

The expected regret of any reinforcement learning algorithm is lower bounded by $Ω (D X A T)$ for undiscounted returns, where $D$ is the diameter of the Markov decision process, $X$ the size of the state space, $A$ the size of the action space and $T$ the number of time steps. However, this lower bound is general. A smaller regret can be obtained by taking into account some specific knowledge of the problem structure. In this article, we consider an admission control problem to an $M / M / c / S$ queue with $m$ job classes and class-dependent rewards and holding costs. Queuing systems often have a diameter that is exponential in the buffer size $S$ , making the previous lower bound prohibitive for any practical use. We propose an algorithm inspired by UCRL2, and use the structure of the problem to upper bound the expected total regret by $O (S lo g T + m T lo g T)$ in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luweber21/ucrl-ac
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Smart Grid Energy Management