Queueing Network Controls via Deep Reinforcement Learning

J. G. Dai; Mark Gluzman

arXiv:2008.01644·math.OC·March 22, 2022

Queueing Network Controls via Deep Reinforcement Learning

J. G. Dai, Mark Gluzman

PDF

1 Repo

TL;DR

This paper develops a PPO-based reinforcement learning approach for complex queueing network control problems, demonstrating superior performance over existing heuristics across various traffic conditions.

Contribution

It extends APG methods like PPO to infinite state space queueing networks with unbounded costs, incorporating variance reduction techniques for effective control policy learning.

Findings

01

PPO outperforms state-of-the-art heuristics in diverse traffic scenarios.

02

The proposed variance reduction techniques improve the stability and accuracy of value function estimation.

03

Near-optimal policies are achieved when the optimal solution is known.

Abstract

Novel advanced policy gradient (APG) methods, such as Trust Region policy optimization and Proximal policy optimization (PPO), have become the dominant reinforcement learning algorithms because of their ease of implementation and good practical performance. A conventional setup for notoriously difficult queueing network control problems is a Markov decision problem (MDP) that has three features: infinite state space, unbounded costs, and long-run average cost objective. We extend the theoretical framework of these APG methods for such MDP problems. The resulting PPO algorithm is tested on a parallel-server system and large-size multiclass queueing networks. The algorithm consistently generates control policies that outperform state-of-art heuristics in literature in a variety of load conditions from light to heavy traffic. These policies are demonstrated to be near-optimal when the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mark-gluzman/NmodelPPO
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsEntropy Regularization · Proximal Policy Optimization