Truncated LinUCB for Stochastic Linear Bandits
Yanglei Song, Meng zhou

TL;DR
This paper introduces Tr-LinUCB, a truncated version of LinUCB for stochastic linear bandits, which achieves near-optimal regret bounds by balancing exploration and exploitation, especially in low-dimensional settings.
Contribution
The paper proposes Tr-LinUCB, a truncation-based algorithm that improves regret bounds and demonstrates rate optimality in linear bandit problems.
Findings
Tr-LinUCB achieves $O(d ext{log}(T))$ regret with proper truncation.
A matching lower bound confirms the rate optimality of Tr-LinUCB.
The algorithm's performance is insensitive to the choice of truncation time in low dimensions.
Abstract
This paper considers contextual bandits with a finite number of arms, where the contexts are independent and identically distributed -dimensional random vectors, and the expected rewards are linear in both the arm parameters and contexts. The LinUCB algorithm, which is near minimax optimal for related linear bandits, is shown to have a cumulative regret that is suboptimal in both the dimension and time horizon , due to its over-exploration. A truncated version of LinUCB is proposed and termed "Tr-LinUCB", which follows LinUCB up to a truncation time and performs pure exploitation afterwards. The Tr-LinUCB algorithm is shown to achieve regret if for a sufficiently large constant , and a matching lower bound is established, which shows the rate optimality of Tr-LinUCB in both and under a low dimensional regime. Further, if $S =…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Optimization and Search Problems
