Optimal UCB Adjustments for Large Arm Sizes

Hock Peng Chan; Shouri Hu

arXiv:1909.02229·math.ST·September 6, 2019·1 cites

Optimal UCB Adjustments for Large Arm Sizes

Hock Peng Chan, Shouri Hu

PDF

Open Access

TL;DR

This paper introduces a new UCB algorithm tailored for scenarios with many arms, achieving lower regret bounds than classical methods by adaptively exploiting arms based on their size, with empirical validation showing superior performance.

Contribution

The paper develops UCB-Large, an adaptive algorithm that adjusts for large arm sizes, achieving optimal regret bounds in settings where arm count grows polynomially with sample size.

Findings

01

UCB-Large attains the smaller regret lower bound for large arm sizes.

02

Numerical experiments demonstrate UCB-Large outperforms classical UCB and Thompson sampling.

03

Theoretical analysis confirms the optimality of UCB-Large in large-arm regimes.

Abstract

The regret lower bound of Lai and Robbins (1985), the gold standard for checking optimality of bandit algorithms, considers arm size fixed as sample size goes to infinity. We show that when arm size increases polynomially with sample size, a surprisingly smaller lower bound is achievable. This is because the larger experimentation costs when there are more arms permit regret savings by exploiting the best performer more often. In particular we are able to construct a UCB-Large algorithm that adaptively exploits more when there are more arms. It achieves the smaller lower bound and is thus optimal. Numerical experiments show that UCB-Large performs better than classical UCB that does not correct for arm size, and better than Thompson sampling.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Auction Theory and Applications