Processing Network Controls via Deep Reinforcement Learning
Mark Gluzman

TL;DR
This paper advances the theoretical understanding and practical application of advanced policy gradient algorithms, like PPO, for complex processing network control problems modeled as MDPs and SMDPs, demonstrating superior control policies in queueing and ride-hailing systems.
Contribution
It refines policy improvement bounds for MDPs and SMDPs and customizes PPO for processing network control problems, showing improved performance over existing heuristics.
Findings
PPO with modifications outperforms heuristics in queueing networks.
PPO effectively solves ride-hailing driver repositioning.
New policy bounds enhance theoretical justification of APG algorithms.
Abstract
Novel advanced policy gradient (APG) algorithms, such as proximal policy optimization (PPO), trust region policy optimization, and their variations, have become the dominant reinforcement learning (RL) algorithms because of their ease of implementation and good practical performance. This dissertation is concerned with theoretical justification and practical application of the APG algorithms for solving processing network control optimization problems. Processing network control problems are typically formulated as Markov decision process (MDP) or semi-Markov decision process (SMDP) problems that have several unconventional for RL features: infinite state spaces, unbounded costs, long-run average cost objectives. Policy improvement bounds play a crucial role in the theoretical justification of the APG algorithms. In this thesis we refine existing bounds for MDPs with finite state spaces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Traffic and Congestion Control · Distributed systems and fault tolerance · Age of Information Optimization
Methodstravel james · Entropy Regularization · Proximal Policy Optimization
