Reward Biased Maximum Likelihood Estimation for Learning in Constrained MDPs
Rahul Singh

TL;DR
This paper introduces the RBMLE algorithm for learning optimal policies in constrained Markov Decision Processes and analyzes its learning regret to evaluate performance.
Contribution
It proposes the Reward Biased Maximum Likelihood Estimation (RBMLE) method specifically for CMDPs and provides theoretical analysis of its regret bounds.
Findings
RBMLE effectively learns optimal policies in CMDPs.
Theoretical regret bounds for RBMLE are established.
The approach advances learning in constrained decision-making environments.
Abstract
We use the Reward Biased Maximum Likelihood Estimation (RBMLE) algorithm to learn optimal policies for constrained Markov Decision Processes (CMDPs). We analyze the learning regrets of RBMLE.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Data Stream Mining Techniques
