Constrained Reinforcement Learning with Average Reward Objective:   Model-Based and Model-Free Algorithms

Vaneet Aggarwal; Washim Uddin Mondal; Qinbo Bai

arXiv:2406.11481·cs.LG·August 26, 2024

Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms

Vaneet Aggarwal, Washim Uddin Mondal, Qinbo Bai

PDF

Open Access

TL;DR

This paper explores model-based and model-free algorithms for constrained reinforcement learning aimed at maximizing average reward, providing theoretical guarantees and extending results to ergodic and weakly communicating MDPs.

Contribution

It introduces and analyzes primal-dual policy gradient methods for constrained average reward MDPs, with regret bounds and constraint violation analysis, broadening applicability to various MDP types.

Findings

01

Regret guarantees for model-based approaches

02

Constraint violation bounds for model-free algorithms

03

Extension of results to weakly communicating MDPs

Abstract

Reinforcement Learning (RL) serves as a versatile framework for sequential decision-making, finding applications across diverse domains such as robotics, autonomous driving, recommendation systems, supply chain optimization, biology, mechanics, and finance. The primary objective in these applications is to maximize the average reward. Real-world scenarios often necessitate adherence to specific constraints during the learning process. This monograph focuses on the exploration of various model-based and model-free approaches for Constrained RL within the context of average reward Markov Decision Processes (MDPs). The investigation commences with an examination of model-based strategies, delving into two foundational methods - optimism in the face of uncertainty and posterior sampling. Subsequently, the discussion transitions to parametrized model-free approaches, where the primal-dual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSupply Chain and Inventory Management