Empirical Evaluation of Policy-Based Reinforcement Learning for Dynamic Service Control in an M/M/1 Queue
Joseph Walton, Gabriel Nicolosi

TL;DR
This paper empirically evaluates three policy-based reinforcement learning algorithms for controlling service rates in an M/M/1 queue, focusing on their efficiency, convergence, and policy quality.
Contribution
It systematically compares REINFORCE, Actor-Critic, and PPO algorithms in a queuing context, highlighting their performance differences and practical applicability.
Findings
PPO converges faster than REINFORCE and Actor-Critic.
Using augmented state information improves policy performance.
All algorithms achieve near-optimal policies with sufficient training.
Abstract
While reinforcement learning has been increasingly applied to stochastic control, few studies have systematically examined policy-based methods in queuing environments modeled as a semi-Markov decision process (SMDP). To address this gap, we investigate how policy-based reinforcement learning (RL) algorithms perform when applied to the control of service rates in an M/M/1 queue, a common queuing model for manufacturing, computing, and service systems. The problem is formulated as an SMDP in which decisions occur at each new service, allowing an agent to select different service rates from a finite set of speeds, aiming to minimize an objective function that manages system congestion and energy costs. Three policy-based reinforcement learning algorithms, namely REINFORCE, Actor-Critic (A2C), and Proximal Policy Optimization (PPO), are trained in a simulated environment using two state…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
