QPLEX Decision Processes: Formulation via Nonlinear Markov Chains and Optimization via Policy Gradients
Antonius B. Dieker, Steven T. Hackman, Zitong Wang, Yunhao Yan

TL;DR
This paper introduces QPLEX Decision Processes (QDPs), a novel framework combining QPLEX modeling with nonlinear Markov decision processes to efficiently optimize queueing systems with complex constraints.
Contribution
The paper develops a new QDP framework that leverages nonlinear transition probabilities and gradient-based optimization to handle high-dimensional queueing control problems efficiently.
Findings
QDP can compute near-optimal policies in seconds for certain reward structures.
QDP effectively handles non-stationary demand and service-level constraints.
The approach reduces computational complexity by avoiding the curse of dimensionality.
Abstract
We introduce a QPLEX Decision Process (QDP) as a model for dynamic control of queueing systems with non-stationary arrivals, general service distributions, and service-level chance constraints. QDPs integrate QPLEX, a computational modeling methodology for transient analysis of stochastic systems, into a nonlinear Markov decision framework. Since QPLEX approximations use nonlinear transition probabilities with orders-of-magnitude smaller state spaces, QDPs circumvent the curse of dimensionality associated with general service times. Via forward and backward iterative schemes, we can rapidly compute gradients deterministically on the much smaller state space, eliminating sampling variance. We further address optimization through natural-gradient-inspired methods with block-diagonal Fisher approximations. To illustrate the QDP methodology, we formulate a single-station dynamic pricing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
