Bandits in Flux: Adversarial Constraints in Dynamic Environments

Tareq Si Salem

arXiv:2601.19867·cs.LG·January 28, 2026

Bandits in Flux: Adversarial Constraints in Dynamic Environments

Tareq Si Salem

PDF

Open Access

TL;DR

This paper introduces a primal-dual algorithm for adversarial multi-armed bandits with time-varying constraints, achieving sublinear regret and constraint violation, and demonstrating superior empirical performance.

Contribution

The paper proposes a novel primal-dual algorithm extending online mirror descent to handle adversarial bandits with dynamic constraints, providing theoretical guarantees and empirical validation.

Findings

01

Achieves sublinear dynamic regret and constraint violation.

02

Outperforms existing methods in empirical evaluations.

03

Provides theoretical guarantees for constrained adversarial bandit setting.

Abstract

We investigate the challenging problem of adversarial multi-armed bandits operating under time-varying constraints, a scenario motivated by numerous real-world applications. To address this complex setting, we propose a novel primal-dual algorithm that extends online mirror descent through the incorporation of suitable gradient estimators and effective constraint handling. We provide theoretical guarantees establishing sublinear dynamic regret and sublinear constraint violation for our proposed policy. Our algorithm achieves state-of-the-art performance in terms of both regret and constraint violation. Empirical evaluations demonstrate the superiority of our approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Reinforcement Learning in Robotics