TSEB: More Efficient Thompson Sampling for Policy Learning
P. Prasanna, Sarath Chandar, Balaraman Ravindran

TL;DR
This paper introduces TSEB, an improved Thompson Sampling algorithm with adaptive exploration for more efficient policy learning, offering tighter PAC guarantees and empirical validation in simulated environments.
Contribution
The paper proposes TSEB, a novel Thompson Sampling algorithm with adaptive exploration bonus, enhancing PAC bounds and balancing exploration and regret in model-based learning.
Findings
TSEB achieves tighter PAC guarantees compared to existing methods.
The adaptive exploration bonus effectively encourages necessary exploration.
Empirical results demonstrate improved performance in simulated domains.
Abstract
In model-based solution approaches to the problem of learning in an unknown environment, exploring to learn the model parameters takes a toll on the regret. The optimal performance with respect to regret or PAC bounds is achievable, if the algorithm exploits with respect to reward or explores with respect to the model parameters, respectively. In this paper, we propose TSEB, a Thompson Sampling based algorithm with adaptive exploration bonus that aims to solve the problem with tighter PAC guarantees, while being cautious on the regret as well. The proposed approach maintains distributions over the model parameters which are successively refined with more experience. At any given time, the agent solves a model sampled from this distribution, and the sampled reward distribution is skewed by an exploration bonus in order to generate more informative exploration. The policy by solving is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
