Exploration in Linear Bandits with Rich Action Sets and its Implications   for Inference

Debangshu Banerjee; Avishek Ghosh; Sayak Ray Chowdhury; Aditya Gopalan

arXiv:2207.11597·cs.LG·January 10, 2023

Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference

Debangshu Banerjee, Avishek Ghosh, Sayak Ray Chowdhury, Aditya Gopalan

PDF

Open Access

TL;DR

This paper establishes a non-asymptotic lower bound on the eigenvalues of the design matrix in linear bandits with well-behaved action sets, revealing polynomial growth and implications for model selection and multi-agent clustering.

Contribution

It provides the first any-time lower bound on the design matrix eigenvalues for rich action spaces in linear bandits, extending previous asymptotic results to practical, finite-time scenarios.

Findings

01

Eigenvalues grow as (\u221a{n}) in well-behaved action spaces.

02

Epoch-based algorithms adapt exponentially to true model complexity.

03

No forced exploration needed for multi-agent clustering with spectral bounds.

Abstract

We present a non-asymptotic lower bound on the eigenspectrum of the design matrix generated by any linear bandit algorithm with sub-linear regret when the action set has well-behaved curvature. Specifically, we show that the minimum eigenvalue of the expected design matrix grows as $Ω (n)$ whenever the expected cumulative regret of the algorithm is $O (n)$ , where $n$ is the learning horizon, and the action-space has a constant Hessian around the optimal arm. This shows that such action-spaces force a polynomial lower bound rather than a logarithmic lower bound, as shown by \cite{lattimore2017end}, in discrete (i.e., well-separated) action spaces. Furthermore, while the previous result is shown to hold only in the asymptotic regime (as $n \to \infty$ ), our result for these "locally rich" action spaces is any-time. Additionally, under a mild technical assumption, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms