A Time and Space Efficient Algorithm for Contextual Linear Bandits
Jos\'e Bento, Stratis Ioannidis, S. Muthukrishnan, Jinyun Yan

TL;DR
This paper introduces a computationally efficient algorithm for contextual linear bandits that achieves logarithmic regret with constant per-iteration complexity and fixed space requirements, even with exponentially many contexts.
Contribution
It presents an $ ext{epsilon}$-greedy algorithm that overcomes previous scalability issues in contextual linear bandits by maintaining low computation and space complexity.
Findings
Achieves $O( ext{poly}(d) \, \log T)$ regret.
Per-iteration complexity is $O(\text{poly}(d))$, independent of $T$.
Space complexity is $O(Kd^2)$, independent of total time steps.
Abstract
We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal rewards, several algorithms have been proposed that achieve regret after time steps. However, proposed methods either have a computation complexity per iteration that scales linearly with or achieve regrets that grow linearly with the number of contexts . We propose an -greedy type of algorithm that solves both limitations. In particular, when contexts are variables in , we prove that our algorithm has a constant computation complexity per iteration of and can achieve a regret of even when . In addition, unlike previous algorithms, its space complexity scales like and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms
