Tangential Randomization in Linear Bandits (TRAiL): Guaranteed Inference and Regret Bounds
Arda G\"u\c{c}l\"u, Subhonmesh Bose

TL;DR
This paper introduces TRAiL, a new efficient algorithm for linear bandits that guarantees near-optimal regret bounds and provides insights into the trade-off between inference quality and regret growth.
Contribution
TRAiL is a novel, computationally efficient exploration algorithm with proven regret and inference bounds, expanding understanding of the trade-offs in linear bandit problems.
Findings
TRAiL achieves an $ ilde{O}( oot{T})$ regret bound.
A new minimax lower bound for linear bandits is established.
Trade-off between regret growth and inference quality is characterized.
Abstract
We propose and analyze TRAiL (Tangential Randomization in Linear Bandits), a computationally efficient regret-optimal forced exploration algorithm for linear bandits on action sets that are sublevel sets of strongly convex functions. TRAiL estimates the governing parameter of the linear bandit problem through a standard regularized least squares and perturbs the reward-maximizing action corresponding to said point estimate along the tangent plane of the convex compact action set before projecting back to it. Exploiting concentration results for matrix martingales, we prove that TRAiL ensures a growth in the inference quality, measured via the minimum eigenvalue of the design (regressor) matrix with high-probability over a -length period. We build on this result to obtain an upper bound on cumulative regret with probability at least $…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Data Stream Mining Techniques
MethodsSparse Evolutionary Training
