HAVER: Instance-Dependent Error Bounds for Maximum Mean Estimation and   Applications to Q-Learning and Monte Carlo Tree Search

Tuan Ngo Nguyen; Jay Barrett; Kwang-Sung Jun

arXiv:2411.00405·stat.ML·April 30, 2025

HAVER: Instance-Dependent Error Bounds for Maximum Mean Estimation and Applications to Q-Learning and Monte Carlo Tree Search

Tuan Ngo Nguyen, Jay Barrett, Kwang-Sung Jun

PDF

Open Access

TL;DR

This paper introduces HAVER, a novel algorithm for estimating the maximum mean among multiple distributions, with superior error bounds and practical effectiveness in Q-learning and MCTS applications.

Contribution

HAVER provides the first instance-dependent error bounds for maximum mean estimation and demonstrates improved rates over the oracle and naive methods.

Findings

01

HAVER achieves better error rates than the oracle in many distributions.

02

HAVER outperforms baseline methods in numerical experiments.

03

Theoretical analysis confirms HAVER's superior performance in various settings.

Abstract

We study the problem of estimating the \emph{value} of the largest mean among K distributions via samples from them (rather than estimating \emph{which} distribution has the largest mean), which arises from various machine learning tasks including Q-learning and Monte Carlo Tree Search (MCTS). While there have been a few proposed algorithms, their performance analyses have been limited to their biases rather than a precise error metric. In this paper, we propose a novel algorithm called HAVER (Head AVERaging) and analyze its mean squared error. Our analysis reveals that HAVER has a compelling performance in two respects. First, HAVER estimates the maximum mean as well as the oracle who knows the identity of the best distribution and reports its sample mean. Second, perhaps surprisingly, HAVER exhibits even better rates than this oracle when there are many distributions near the best…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning

MethodsQ-Learning