Nonstochastic Bandits with Composite Anonymous Feedback

Nicol\`o Cesa-Bianchi; Tommaso Cesari; Roberto Colomboni; Claudio; Gentile; Yishay Mansour

arXiv:2112.02866·cs.LG·September 27, 2022·21 cites

Nonstochastic Bandits with Composite Anonymous Feedback

Nicol\`o Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Claudio, Gentile, Yishay Mansour

PDF

Open Access

TL;DR

This paper studies a nonstochastic bandit setting with composite anonymous feedback, introducing a reduction technique to adapt standard algorithms and establishing near-optimal regret bounds for the challenging delayed feedback scenario.

Contribution

It presents a reduction transforming standard bandit algorithms to handle composite anonymous feedback and provides regret bounds for this setting, including a near-matching lower bound.

Findings

01

A reduction method bounds regret in terms of original algorithm's stability and regret.

02

A tuned FTRL with Tsallis entropy achieves regret of order √((d+1)KT).

03

Matching lower bounds demonstrate the optimality of the results.

Abstract

We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player, but rather spread over the subsequent rounds in an adversarial way. The instantaneous loss observed by the player at the end of each round is then a sum of many loss components of previously played actions. This setting encompasses as a special case the easier task of bandits with delayed feedback, a well-studied framework where the player observes the delayed losses individually. Our first contribution is a general reduction transforming a standard bandit algorithm into one that can operate in the harder setting: We bound the regret of the transformed algorithm in terms of the stability and regret of the original algorithm. Then, we show that the transformation of a suitably tuned FTRL with Tsallis entropy has a regret of order $(d + 1) K T$ , where $d$ is the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Statistical Mechanics and Entropy