Scale-Free Adversarial Multi-Armed Bandit with Arbitrary Feedback Delays

Jiatai Huang; Yan Dai; Longbo Huang

arXiv:2110.13400·cs.LG·January 27, 2023

Scale-Free Adversarial Multi-Armed Bandit with Arbitrary Feedback Delays

Jiatai Huang, Yan Dai, Longbo Huang

PDF

Open Access

TL;DR

This paper introduces a novel algorithm for adversarial multi-armed bandit problems with arbitrary feedback delays and losses in a general bounded interval, achieving near-optimal regret bounds.

Contribution

The paper proposes the Scale-Free Delayed INF (SFD-INF) approach, combining a convex combination trick with doubling and skipping techniques, to handle arbitrary delays and scale-free losses.

Findings

01

Achieves near-optimal regret bounds for delayed feedback scenarios.

02

Outperforms existing algorithms in non-delayed scale-free adversarial MAB problems.

03

Provides two instances, SFD-TINF and SFD-LBINF, with different regret guarantees.

Abstract

We consider the Scale-Free Adversarial Multi-Armed Bandit (MAB) problem with unrestricted feedback delays. In contrast to the standard assumption that all losses are $[0, 1]$ -bounded, in our setting, losses can fall in a general bounded interval $[- L, L]$ , unknown to the agent beforehand. Furthermore, the feedback of each arm pull can experience arbitrary delays. We propose a novel approach named Scale-Free Delayed INF (SFD-INF) for this novel setting, which combines a recent "convex combination trick" together with a novel doubling and skipping technique. We then present two instances of SFD-INF, each with carefully designed delay-adapted learning scales. The first one SFD-TINF uses $\frac{1}{2}$ -Tsallis entropy regularizer and can achieve $O (K (D + T) L)$ regret when the losses are non-negative, where $K$ is the number of actions, $T$ is the number of steps, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Sparse and Compressive Sensing Techniques