Operator Splitting for Convex Constrained Markov Decision Processes

Panagiotis D. Grontas; Anastasios Tsiamis; John Lygeros

arXiv:2412.14002·math.OC·December 19, 2024

Operator Splitting for Convex Constrained Markov Decision Processes

Panagiotis D. Grontas, Anastasios Tsiamis, John Lygeros

PDF

Open Access

TL;DR

This paper introduces a scalable first-order operator splitting algorithm for convex constrained Markov decision processes, enabling efficient handling of complex constraints with guaranteed convergence.

Contribution

It develops a novel Douglas-Rachford splitting-based method that decomposes MDP dynamics and constraints, improving scalability and flexibility over traditional convex optimization approaches.

Findings

01

Algorithm demonstrates favorable performance on benchmark problems.

02

Ensures last-iterate convergence and numerical stability.

03

Effectively detects infeasibility and computes minimally violating policies.

Abstract

We consider finite Markov decision processes (MDPs) with convex constraints and known dynamics. In principle, this problem is amenable to off-the-shelf convex optimization solvers, but typically this approach suffers from poor scalability. In this work, we develop a first-order algorithm, based on the Douglas-Rachford splitting, that allows us to decompose the dynamics and constraints. Thanks to this decoupling, we can incorporate a wide variety of convex constraints. Our scheme consists of simple and easy-to-implement updates that alternate between solving a regularized MDP and a projection. The inherent presence of regularized updates ensures last-iterate convergence, numerical stability, and, contrary to existing approaches, does not require us to regularize the problem explicitly. If the constraints are not attainable, we exploit salient properties of the Douglas-Rachord algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications