TL;DR
This paper introduces a versatile online optimization framework that guarantees near-optimal rewards while respecting long-term constraints, applicable in stochastic and adversarial settings with full or bandit feedback.
Contribution
It presents the first unified algorithm achieving optimal regret and constraint satisfaction in both stochastic and adversarial online learning scenarios.
Findings
Guarantees a fraction of the optimal reward with sublinear regret.
Handles non-convex rewards and constraints seamlessly.
Applicable to budget management in repeated auctions.
Abstract
We study online learning problems in which a decision maker has to take a sequence of decisions subject to long-term constraints. The goal of the decision maker is to maximize their total reward, while at the same time achieving small cumulative constraints violation across the rounds. We present the first best-of-both-world type algorithm for this general class of problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown stochastic model, and in the case in which they are selected at each round by an adversary. Our algorithm is the first to provide guarantees in the adversarial setting with respect to the optimal fixed strategy that satisfies the long-term constraints. In particular, it guarantees a fraction of the optimal reward and sublinear regret, where is a feasibility parameter related…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
