Scale-Free Adversarial Multi-Armed Bandit with Arbitrary Feedback Delays
Jiatai Huang, Yan Dai, Longbo Huang

TL;DR
This paper introduces a novel algorithm for adversarial multi-armed bandit problems with arbitrary feedback delays and losses in a general bounded interval, achieving near-optimal regret bounds.
Contribution
The paper proposes the Scale-Free Delayed INF (SFD-INF) approach, combining a convex combination trick with doubling and skipping techniques, to handle arbitrary delays and scale-free losses.
Findings
Achieves near-optimal regret bounds for delayed feedback scenarios.
Outperforms existing algorithms in non-delayed scale-free adversarial MAB problems.
Provides two instances, SFD-TINF and SFD-LBINF, with different regret guarantees.
Abstract
We consider the Scale-Free Adversarial Multi-Armed Bandit (MAB) problem with unrestricted feedback delays. In contrast to the standard assumption that all losses are -bounded, in our setting, losses can fall in a general bounded interval , unknown to the agent beforehand. Furthermore, the feedback of each arm pull can experience arbitrary delays. We propose a novel approach named Scale-Free Delayed INF (SFD-INF) for this novel setting, which combines a recent "convex combination trick" together with a novel doubling and skipping technique. We then present two instances of SFD-INF, each with carefully designed delay-adapted learning scales. The first one SFD-TINF uses -Tsallis entropy regularizer and can achieve regret when the losses are non-negative, where is the number of actions, is the number of steps, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Sparse and Compressive Sensing Techniques
