Efficiently Training Deep-Learning Parametric Policies using Lagrangian Duality
Andrew Rosemberg, Alexandre Street, Davi M. Vallad\~ao, Pascal Van, Hentenryck

TL;DR
This paper presents TS-DDR, a novel deep learning-based method using Lagrangian duality to efficiently train policies for constrained Markov Decision Processes, significantly improving solution quality and computational efficiency in power system applications.
Contribution
Introduces TS-DDR, a self-supervised deep learning algorithm that trains parametric policies for CMDPs using Lagrangian duality with closed-form gradients.
Findings
TS-DDR outperforms existing methods in solution quality.
TS-DDR reduces computation times by orders of magnitude.
Applied to power systems, TS-DDR demonstrates practical effectiveness.
Abstract
Constrained Markov Decision Processes (CMDPs) are critical in many high-stakes applications, where decisions must optimize cumulative rewards while strictly adhering to complex nonlinear constraints. In domains such as power systems, finance, supply chains, and precision robotics, violating these constraints can result in significant financial or societal costs. Existing Reinforcement Learning (RL) methods often struggle with sample efficiency and effectiveness in finding feasible policies for highly and strictly constrained CMDPs, limiting their applicability in these environments. Stochastic dual dynamic programming is often used in practice on convex relaxations of the original problem, but they also encounter computational challenges and loss of optimality. This paper introduces a novel approach, Two-Stage Deep Decision Rules (TS-DDR), to efficiently train parametric actor policies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRough Sets and Fuzzy Logic · Multi-Criteria Decision Making
