Bandits with Replenishable Knapsacks: the Best of both Worlds
Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Federico Fusco

TL;DR
This paper introduces a new framework for online decision-making with replenishable resources, achieving strong theoretical guarantees in both adversarial and stochastic settings, and applicable to economic problems.
Contribution
It extends the bandits with knapsack model to include resource replenishment and provides a primal-dual algorithm with competitive guarantees and regret bounds.
Findings
Guarantees a constant competitive ratio under certain conditions.
Achieves an $ ilde{O}(T^{1/2})$ regret bound in stochastic models.
Applicable to practical economic decision problems.
Abstract
The bandits with knapsack (BwK) framework models online decision-making problems in which an agent makes a sequence of decisions subject to resource consumption constraints. The traditional model assumes that each action consumes a non-negative amount of resources and the process ends when the initial budgets are fully depleted. We study a natural generalization of the BwK framework which allows non-monotonic resource utilization, i.e., resources can be replenished by a positive amount. We propose a best-of-both-worlds primal-dual template that can handle any online learning problem with replenishment for which a suitable primal regret minimizer exists. In particular, we provide the first positive results for the case of adversarial inputs by showing that our framework guarantees a constant competitive ratio when or when the possible per-round replenishment is a…
Peer Reviews
Decision·ICLR 2024 poster
1. This paper proposes and studies a novel and interesting problem. It is pretty smart to consider the case where the cost of an arm could be negative. 2. The results are rich and good. It studies multiple cases, and provides good results for them. The results look good enough from my understanding and knowledge.
This paper has one major weakness, and one minor weakness about presenting. 1. It is not well discussed or stated how the negative cost could affect the problem. Is this new setup invalidate the existing algorithms, or it creates major challenges where we cannot simply use the existing algorithms or the regret analysis techniques? The algorithm looks pretty similar to the existing works, so as the regret bounds. From the current writing, I am not able to tell how hard the new setting is. 2. Th
- Well-motived model formulation. BwRK is an interesting extension of BwK, and the authors provide application examples as well (Sec. 6). - Algorithmic framework. The authors propose a primal-dual template (Alg. 1) that can be applied with various minimizers under different scenarios.
- The theoretical results are not clearly discussed. For example, how tight are the theoretical results compared to lower bounds? - Lack of experiments. It would be interesting to know the empirical performance of the proposed algorithm. Especially compare the actual performance of this paper's algorithm with known ones in BwK when $\beta=0$. Are they exactly the same algorithms?
The problem setup is clearly motivated and introduced. The algorithm and analysis are thoroughly explained and neatly presented. Regret bounds improves on existing results.
I believe there are some related work that could be beneficial to be added to the comparison. The line of work I would like to mention is the bandits with (soft) constraints. The authors argue that $\textit{in BwK problems constraints are required to be satisfied strictly at all rounds}$. However, when allowing negative $c_t$s, it is essentially allowing constraint violation in those rounds.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Auction Theory and Applications
