No-Regret is not enough! Bandits with General Constraints through   Adaptive Regret Minimization

Martino Bernasconi; Matteo Castiglioni; Andrea Celli

arXiv:2405.06575·cs.LG·May 13, 2024

No-Regret is not enough! Bandits with General Constraints through Adaptive Regret Minimization

Martino Bernasconi, Matteo Castiglioni, Andrea Celli

PDF

Open Access

TL;DR

This paper introduces a novel approach for bandits with general long-term constraints using weakly adaptive primal and dual regret minimizers, achieving sublinear regret and constraint violations in both stochastic and adversarial settings.

Contribution

It demonstrates that weakly adaptive algorithms can ensure bounded dual variables and achieve optimal guarantees for constrained bandit problems without prior knowledge of problem parameters.

Findings

01

Achieves sublinear regret in stochastic bandits with constraints.

02

Provides tight competitive ratio of /(1+) in adversarial settings.

03

First to offer no--regret guarantees for adversarial contextual bandits.

Abstract

In the bandits with knapsacks framework (BwK) the learner has $m$ resource-consumption (packing) constraints. We focus on the generalization of BwK in which the learner has a set of general long-term constraints. The goal of the learner is to maximize their cumulative reward, while at the same time achieving small cumulative constraints violations. In this scenario, there exist simple instances where conventional methods for BwK fail to yield sublinear violations of constraints. We show that it is possible to circumvent this issue by requiring the primal and dual algorithm to be weakly adaptive. Indeed, even in absence on any information on the Slater's parameter $ρ$ characterizing the problem, the interplay between weakly adaptive primal and dual regret minimizers yields a "self-bounding" property of dual variables. In particular, their norm remains suitably upper bounded across the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Forecasting Techniques and Applications

MethodsSparse Evolutionary Training · Focus