Non-stochastic Bandits With Evolving Observations

Yogev Bar-On; Yishay Mansour

arXiv:2405.16843·cs.LG·May 28, 2024

Non-stochastic Bandits With Evolving Observations

Yogev Bar-On, Yishay Mansour

PDF

Open Access 4 Reviews

TL;DR

This paper introduces a new online learning framework for non-stochastic bandits with evolving, adversarial feedback, providing algorithms with regret bounds that adapt to feedback accuracy and unify several existing models.

Contribution

It presents a unified framework for non-stochastic bandits with evolving observations and develops regret minimization algorithms with novel bounds.

Findings

01

Algorithms match known regret bounds in special cases

02

New regret bounds are established for evolving feedback scenarios

03

Framework generalizes delayed and corrupted feedback models

Abstract

We introduce a novel online learning framework that unifies and generalizes pre-established models, such as delayed and corrupted feedback, to encompass adversarial environments where action feedback evolves over time. In this setting, the observed loss is arbitrary and may not correlate with the true loss incurred, with each round updating previous observations adversarially. We propose regret minimization algorithms for both the full-information and bandit settings, with regret bounds quantified by the average feedback accuracy relative to the true loss. Our algorithms match the known regret bounds across many special cases, while also introducing previously unknown bounds.

Peer Reviews

Decision·ALT 2025

Reviewer 01Rating · AcceptConfidence 4

Reviewer 02Rating 6Confidence 3

Reviewer 03Rating 7Confidence 3

Reviewer 04Rating 8Confidence 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Data Stream Mining Techniques