Efficiently Training Deep-Learning Parametric Policies using Lagrangian   Duality

Andrew Rosemberg; Alexandre Street; Davi M. Vallad\~ao; Pascal Van; Hentenryck

arXiv:2405.14973·cs.LG·February 21, 2025

Efficiently Training Deep-Learning Parametric Policies using Lagrangian Duality

Andrew Rosemberg, Alexandre Street, Davi M. Vallad\~ao, Pascal Van, Hentenryck

PDF

Open Access

TL;DR

This paper presents TS-DDR, a novel deep learning-based method using Lagrangian duality to efficiently train policies for constrained Markov Decision Processes, significantly improving solution quality and computational efficiency in power system applications.

Contribution

Introduces TS-DDR, a self-supervised deep learning algorithm that trains parametric policies for CMDPs using Lagrangian duality with closed-form gradients.

Findings

01

TS-DDR outperforms existing methods in solution quality.

02

TS-DDR reduces computation times by orders of magnitude.

03

Applied to power systems, TS-DDR demonstrates practical effectiveness.

Abstract

Constrained Markov Decision Processes (CMDPs) are critical in many high-stakes applications, where decisions must optimize cumulative rewards while strictly adhering to complex nonlinear constraints. In domains such as power systems, finance, supply chains, and precision robotics, violating these constraints can result in significant financial or societal costs. Existing Reinforcement Learning (RL) methods often struggle with sample efficiency and effectiveness in finding feasible policies for highly and strictly constrained CMDPs, limiting their applicability in these environments. Stochastic dual dynamic programming is often used in practice on convex relaxations of the original problem, but they also encounter computational challenges and loss of optimality. This paper introduces a novel approach, Two-Stage Deep Decision Rules (TS-DDR), to efficiently train parametric actor policies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRough Sets and Fuzzy Logic · Multi-Criteria Decision Making