Low-rank Matrix Bandits with Heavy-tailed Rewards

Yue Kang; Cho-Jui Hsieh; Thomas C. M. Lee

arXiv:2404.17709·stat.ML·April 30, 2024

Low-rank Matrix Bandits with Heavy-tailed Rewards

Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee

PDF

Open Access

TL;DR

This paper introduces LOTUS, a novel algorithm for low-rank matrix bandits with heavy-tailed rewards, achieving near-optimal regret bounds without prior knowledge of certain parameters.

Contribution

The work develops LOTUS, the first algorithm for heavy-tailed reward matrix bandits with theoretical guarantees matching lower bounds, and improves it for high-dimensional settings.

Findings

01

LOTUS attains near-optimal regret bounds for heavy-tailed rewards.

02

The lower bound matches the regret order of LOTUS, indicating near-optimality.

03

Simulations demonstrate the practical effectiveness of the proposed algorithm.

Abstract

In stochastic low-rank matrix bandit, the expected reward of an arm is equal to the inner product between its feature matrix and some unknown $d_{1}$ by $d_{2}$ low-rank parameter matrix $Θ^{*}$ with rank $r ≪ d_{1} \land d_{2}$ . While all prior studies assume the payoffs are mixed with sub-Gaussian noises, in this work we loosen this strict assumption and consider the new problem of \underline{low}-rank matrix bandit with \underline{h}eavy-\underline{t}ailed \underline{r}ewards (LowHTR), where the rewards only have finite $(1 + δ)$ moment for some $δ \in (0, 1]$ . By utilizing the truncation on observed payoffs and the dynamic exploration, we propose a novel algorithm called LOTUS attaining the regret bound of order $\tilde{O} (d^{\frac{3}{2}} r^{\frac{1}{2}} T^{\frac{1}{1 + δ}} / \tilde{D}_{r r})$ without knowing $T$ , which matches the state-of-the-art regret bound under sub-Gaussian…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques