Bounding Procedures for Stochastic Dynamic Programs with Application to the Perimeter Patrol Problem
Myoungkuk Park, Krishnamoorthy Kalyanam, Swaroop Darbha, Phil, Chandler, Meir Pachter

TL;DR
This paper introduces a linear programming method to construct bounds for the optimal policies in high-dimensional stochastic dynamic programs, demonstrated on a perimeter patrol problem.
Contribution
It presents a novel LP-based approach to approximate and bound the value function in large MDPs, independent of specific cost functions, with applications to surveillance problems.
Findings
The approximate value function provides an upper bound for the optimal value.
The method yields a lower bound via a reduced-dimension MDP.
Numerical results validate the effectiveness of the approach.
Abstract
One often encounters the curse of dimensionality in the application of dynamic programming to determine optimal policies for controlled Markov chains. In this paper, we provide a method to construct sub-optimal policies along with a bound for the deviation of such a policy from the optimum via a linear programming approach. The state-space is partitioned and the optimal cost-to-go or value function is approximated by a constant over each partition. By minimizing a non-negative cost function defined on the partitions, one can construct an approximate value function which also happens to be an upper bound for the optimal value function of the original Markov Decision Process (MDP). As a key result, we show that this approximate value function is {\it independent} of the non-negative cost function (or state dependent weights as it is referred to in the literature) and moreover, this is the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Search Problems · Reliability and Maintenance Optimization
