Low-rank Matrix Bandits with Heavy-tailed Rewards
Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee

TL;DR
This paper introduces LOTUS, a novel algorithm for low-rank matrix bandits with heavy-tailed rewards, achieving near-optimal regret bounds without prior knowledge of certain parameters.
Contribution
The work develops LOTUS, the first algorithm for heavy-tailed reward matrix bandits with theoretical guarantees matching lower bounds, and improves it for high-dimensional settings.
Findings
LOTUS attains near-optimal regret bounds for heavy-tailed rewards.
The lower bound matches the regret order of LOTUS, indicating near-optimality.
Simulations demonstrate the practical effectiveness of the proposed algorithm.
Abstract
In stochastic low-rank matrix bandit, the expected reward of an arm is equal to the inner product between its feature matrix and some unknown by low-rank parameter matrix with rank . While all prior studies assume the payoffs are mixed with sub-Gaussian noises, in this work we loosen this strict assumption and consider the new problem of \underline{low}-rank matrix bandit with \underline{h}eavy-\underline{t}ailed \underline{r}ewards (LowHTR), where the rewards only have finite moment for some . By utilizing the truncation on observed payoffs and the dynamic exploration, we propose a novel algorithm called LOTUS attaining the regret bound of order without knowing , which matches the state-of-the-art regret bound under sub-Gaussian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques
