Processing Network Controls via Deep Reinforcement Learning

Mark Gluzman

arXiv:2205.02119·math.OC·May 5, 2022

Processing Network Controls via Deep Reinforcement Learning

Mark Gluzman

PDF

Open Access

TL;DR

This paper advances the theoretical understanding and practical application of advanced policy gradient algorithms, like PPO, for complex processing network control problems modeled as MDPs and SMDPs, demonstrating superior control policies in queueing and ride-hailing systems.

Contribution

It refines policy improvement bounds for MDPs and SMDPs and customizes PPO for processing network control problems, showing improved performance over existing heuristics.

Findings

01

PPO with modifications outperforms heuristics in queueing networks.

02

PPO effectively solves ride-hailing driver repositioning.

03

New policy bounds enhance theoretical justification of APG algorithms.

Abstract

Novel advanced policy gradient (APG) algorithms, such as proximal policy optimization (PPO), trust region policy optimization, and their variations, have become the dominant reinforcement learning (RL) algorithms because of their ease of implementation and good practical performance. This dissertation is concerned with theoretical justification and practical application of the APG algorithms for solving processing network control optimization problems. Processing network control problems are typically formulated as Markov decision process (MDP) or semi-Markov decision process (SMDP) problems that have several unconventional for RL features: infinite state spaces, unbounded costs, long-run average cost objectives. Policy improvement bounds play a crucial role in the theoretical justification of the APG algorithms. In this thesis we refine existing bounds for MDPs with finite state spaces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Traffic and Congestion Control · Distributed systems and fault tolerance · Age of Information Optimization

Methodstravel james · Entropy Regularization · Proximal Policy Optimization