A policy gradient approach for Finite Horizon Constrained Markov Decision Processes
Soumyajit Guin, Shalabh Bhatnagar

TL;DR
This paper introduces the first policy gradient algorithm tailored for finite horizon constrained Markov Decision Processes, enabling the derivation of non-stationary, stage-dependent policies with proven convergence.
Contribution
It develops a novel constrained policy gradient method specifically for finite horizon problems, addressing the gap in existing infinite horizon-focused algorithms.
Findings
Algorithm converges to a constrained optimal policy.
Performs better than existing algorithms in experiments.
Handles large or continuous state and action spaces effectively.
Abstract
The infinite horizon setting is widely adopted for problems of reinforcement learning (RL). These invariably result in stationary policies that are optimal. In many situations, finite horizon control problems are of interest and for such problems, the optimal policies are time-varying in general. Another setting that has become popular in recent times is of Constrained Reinforcement Learning, where the agent maximizes its rewards while it also aims to satisfy some given constraint criteria. However, this setting has only been studied in the context of infinite horizon MDPs where stationary policies are optimal. We present an algorithm for constrained RL in the Finite Horizon Setting where the horizon terminates after a fixed (finite) time. We use function approximation in our algorithm which is essential when the state and action spaces are large or continuous and use the policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Advanced Control Systems Optimization
