Reward Biased Maximum Likelihood Estimation for Learning in Constrained   MDPs

Rahul Singh

arXiv:2105.13919·math.OC·May 31, 2021

Reward Biased Maximum Likelihood Estimation for Learning in Constrained MDPs

Rahul Singh

PDF

Open Access

TL;DR

This paper introduces the RBMLE algorithm for learning optimal policies in constrained Markov Decision Processes and analyzes its learning regret to evaluate performance.

Contribution

It proposes the Reward Biased Maximum Likelihood Estimation (RBMLE) method specifically for CMDPs and provides theoretical analysis of its regret bounds.

Findings

01

RBMLE effectively learns optimal policies in CMDPs.

02

Theoretical regret bounds for RBMLE are established.

03

The approach advances learning in constrained decision-making environments.

Abstract

We use the Reward Biased Maximum Likelihood Estimation (RBMLE) algorithm to learn optimal policies for constrained Markov Decision Processes (CMDPs). We analyze the learning regrets of RBMLE.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Data Stream Mining Techniques