Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms
Vaneet Aggarwal, Washim Uddin Mondal, Qinbo Bai

TL;DR
This paper explores model-based and model-free algorithms for constrained reinforcement learning aimed at maximizing average reward, providing theoretical guarantees and extending results to ergodic and weakly communicating MDPs.
Contribution
It introduces and analyzes primal-dual policy gradient methods for constrained average reward MDPs, with regret bounds and constraint violation analysis, broadening applicability to various MDP types.
Findings
Regret guarantees for model-based approaches
Constraint violation bounds for model-free algorithms
Extension of results to weakly communicating MDPs
Abstract
Reinforcement Learning (RL) serves as a versatile framework for sequential decision-making, finding applications across diverse domains such as robotics, autonomous driving, recommendation systems, supply chain optimization, biology, mechanics, and finance. The primary objective in these applications is to maximize the average reward. Real-world scenarios often necessitate adherence to specific constraints during the learning process. This monograph focuses on the exploration of various model-based and model-free approaches for Constrained RL within the context of average reward Markov Decision Processes (MDPs). The investigation commences with an examination of model-based strategies, delving into two foundational methods - optimism in the face of uncertainty and posterior sampling. Subsequently, the discussion transitions to parametrized model-free approaches, where the primal-dual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSupply Chain and Inventory Management
