A Refined Analysis of UCBVI

Simone Drago; Marco Mussi; Alberto Maria Metelli

arXiv:2502.17370·stat.ML·May 30, 2025

A Refined Analysis of UCBVI

Simone Drago, Marco Mussi, Alberto Maria Metelli

PDF

Open Access

TL;DR

This paper offers a refined theoretical analysis of the UCBVI algorithm, improving its regret bounds and empirical performance, and compares it with previous algorithms including MVP.

Contribution

It provides a tighter regret analysis of UCBVI, introduces improved bonus terms, and empirically validates the benefits of these enhancements.

Findings

01

Improved regret bounds for UCBVI.

02

Enhanced empirical performance with refined bonus terms.

03

Better comparison results with MVP algorithm.

Abstract

In this work, we provide a refined analysis of the UCBVI algorithm (Azar et al., 2017), improving both the bonus terms and the regret analysis. Additionally, we compare our version of UCBVI with both its original version and the state-of-the-art MVP algorithm. Our empirical validation demonstrates that improving the multiplicative constants in the bounds has significant positive effects on the empirical performance of the algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTribology and Lubrication Engineering