Catoni Contextual Bandits are Robust to Heavy-tailed Rewards

Chenlu Ye; Yujia Jin; Alekh Agarwal; Tong Zhang

arXiv:2502.02486·stat.ML·February 5, 2025

Catoni Contextual Bandits are Robust to Heavy-tailed Rewards

Chenlu Ye, Yujia Jin, Alekh Agarwal, Tong Zhang

PDF

Open Access

TL;DR

This paper introduces a robust algorithm for contextual bandits that handles heavy-tailed rewards, achieving regret bounds that depend only on reward variance and logarithmically on the reward range, improving robustness over traditional methods.

Contribution

The paper develops a new algorithm using Catoni's estimator for robust contextual bandits, with regret bounds that are less sensitive to reward range and heavy tails, including unknown variance scenarios.

Findings

01

Regret depends only on reward variance and logarithmically on reward range R.

02

Proposed algorithms are robust to heavy-tailed rewards and unknown variances.

03

Matching lower bounds demonstrate the optimality of the regret bounds.

Abstract

Typical contextual bandit algorithms assume that the rewards at each round lie in some fixed range $[0, R]$ , and their regret scales polynomially with this reward range $R$ . However, many practical scenarios naturally involve heavy-tailed rewards or rewards where the worst-case range can be substantially larger than the variance. In this paper, we develop an algorithmic approach building on Catoni's estimator from robust statistics, and apply it to contextual bandits with general function approximation. When the variance of the reward at each round is known, we use a variance-weighted regression approach and establish a regret bound that depends only on the cumulative reward variance and logarithmically on the reward range $R$ as well as the number of rounds $T$ . For the unknown-variance case, we further propose a careful peeling-based algorithm and remove the need for cumbersome…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Decision-Making and Behavioral Economics · Auction Theory and Applications