A Refined Analysis of UCBVI
Simone Drago, Marco Mussi, Alberto Maria Metelli

TL;DR
This paper offers a refined theoretical analysis of the UCBVI algorithm, improving its regret bounds and empirical performance, and compares it with previous algorithms including MVP.
Contribution
It provides a tighter regret analysis of UCBVI, introduces improved bonus terms, and empirically validates the benefits of these enhancements.
Findings
Improved regret bounds for UCBVI.
Enhanced empirical performance with refined bonus terms.
Better comparison results with MVP algorithm.
Abstract
In this work, we provide a refined analysis of the UCBVI algorithm (Azar et al., 2017), improving both the bonus terms and the regret analysis. Additionally, we compare our version of UCBVI with both its original version and the state-of-the-art MVP algorithm. Our empirical validation demonstrates that improving the multiplicative constants in the bounds has significant positive effects on the empirical performance of the algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTribology and Lubrication Engineering
