Reinforcement Learning and Regret Bounds for Admission Control
Lucas Weber, Ana Bu\v{s}i\'c, Jiamin Zhu

TL;DR
This paper analyzes the regret bounds for reinforcement learning in admission control problems, proposing an algorithm that leverages problem structure to achieve lower regret bounds than general cases, especially in queueing systems.
Contribution
It introduces a structure-aware reinforcement learning algorithm for admission control in queueing systems, improving regret bounds by exploiting problem-specific features.
Findings
Regret bounds are lower in structured queueing problems compared to general MDPs.
The proposed algorithm achieves an expected regret of O(S log T + √(mT log T)) in finite server cases.
In infinite server cases, the regret dependence on buffer size S vanishes.
Abstract
The expected regret of any reinforcement learning algorithm is lower bounded by for undiscounted returns, where is the diameter of the Markov decision process, the size of the state space, the size of the action space and the number of time steps. However, this lower bound is general. A smaller regret can be obtained by taking into account some specific knowledge of the problem structure. In this article, we consider an admission control problem to an queue with job classes and class-dependent rewards and holding costs. Queuing systems often have a diameter that is exponential in the buffer size , making the previous lower bound prohibitive for any practical use. We propose an algorithm inspired by UCRL2, and use the structure of the problem to upper bound the expected total regret by in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Smart Grid Energy Management
