Reward Biased Maximum Likelihood Estimation for Reinforcement Learning
Akshay Mete, Rahul Singh, Xi Liu, P. R. Kumar

TL;DR
This paper introduces Reward Biased Maximum Likelihood Estimation (RBMLE) for reinforcement learning, demonstrating its optimal regret bounds and superior empirical performance in controlling unknown Markov Decision Processes.
Contribution
It extends RBMLE to finite-time reinforcement learning, providing theoretical regret bounds and empirical evidence of outperforming existing algorithms.
Findings
RBMLE achieves $oxed{ ext{O}( ext{log } T)}$ regret for MDPs.
Simulation results show RBMLE outperforms UCRL2 and Thompson Sampling.
RBMLE exhibits competitive or superior empirical performance.
Abstract
The Reward-Biased Maximum Likelihood Estimate (RBMLE) for adaptive control of Markov chains was proposed to overcome the central obstacle of what is variously called the fundamental "closed-identifiability problem" of adaptive control, the "dual control problem", or, contemporaneously, the "exploration vs. exploitation problem". It exploited the key observation that since the maximum likelihood parameter estimator can asymptotically identify the closed-transition probabilities under a certainty equivalent approach, the limiting parameter estimates must necessarily have an optimal reward that is less than the optimal reward attainable for the true but unknown system. Hence it proposed a counteracting reverse bias in favor of parameters with larger optimal rewards, providing a solution to the fundamental problem alluded to above. It thereby proposed an optimistic approach of favoring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adaptive Dynamic Programming Control
