Gradient-Bounded Dynamic Programming for Submodular and Concave Extensible Value Functions with Probabilistic Performance Guarantees
Denis Lebedev, Paul Goulart, Kostas Margellos

TL;DR
This paper introduces a new algorithm for high-dimensional stochastic dynamic programming with submodular and concave value functions, providing probabilistic performance guarantees and demonstrating effectiveness in delivery pricing.
Contribution
The paper presents a novel dual dynamic programming algorithm that computes bounds for complex value functions with probabilistic guarantees, addressing the curse of dimensionality.
Findings
Algorithm terminates after finite iterations
Provides probabilistic guarantees on policy performance
Effective in high-dimensional delivery pricing example
Abstract
We consider stochastic dynamic programming problems with high-dimensional, discrete state-spaces and finite, discrete-time horizons that prohibit direct computation of the value function from a given Bellman equation for all states and time steps due to the "curse of dimensionality". For the case where the value function of the dynamic program is concave extensible and submodular in its state-space, we present a new algorithm that computes deterministic upper and stochastic lower bounds of the value function in the realm of dual dynamic programming. We show that the proposed algorithm terminates after a finite number of iterations. Furthermore, we derive probabilistic guarantees on the value accumulated under the associated policy for a single realisation of the dynamic program and for the expectation of this value. Finally, we demonstrate the efficacy of our approach on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
