A policy gradient approach for Finite Horizon Constrained Markov   Decision Processes

Soumyajit Guin; Shalabh Bhatnagar

arXiv:2210.04527·cs.LG·March 21, 2025·1 cites

A policy gradient approach for Finite Horizon Constrained Markov Decision Processes

Soumyajit Guin, Shalabh Bhatnagar

PDF

Open Access 1 Repo

TL;DR

This paper introduces the first policy gradient algorithm tailored for finite horizon constrained Markov Decision Processes, enabling the derivation of non-stationary, stage-dependent policies with proven convergence.

Contribution

It develops a novel constrained policy gradient method specifically for finite horizon problems, addressing the gap in existing infinite horizon-focused algorithms.

Findings

01

Algorithm converges to a constrained optimal policy.

02

Performs better than existing algorithms in experiments.

03

Handles large or continuous state and action spaces effectively.

Abstract

The infinite horizon setting is widely adopted for problems of reinforcement learning (RL). These invariably result in stationary policies that are optimal. In many situations, finite horizon control problems are of interest and for such problems, the optimal policies are time-varying in general. Another setting that has become popular in recent times is of Constrained Reinforcement Learning, where the agent maximizes its rewards while it also aims to satisfy some given constraint criteria. However, this setting has only been studied in the context of infinite horizon MDPs where stationary policies are optimal. We present an algorithm for constrained RL in the Finite Horizon Setting where the horizon terminates after a fixed (finite) time. We use function approximation in our algorithm which is essential when the state and action spaces are large or continuous and use the policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gsoumyajit/Finite-Horizon-with-constraints
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Advanced Control Systems Optimization