Bandits with Replenishable Knapsacks: the Best of both Worlds

Martino Bernasconi; Matteo Castiglioni; Andrea Celli; Federico Fusco

arXiv:2306.08470·cs.LG·June 16, 2023·1 cites

Bandits with Replenishable Knapsacks: the Best of both Worlds

Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Federico Fusco

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a new framework for online decision-making with replenishable resources, achieving strong theoretical guarantees in both adversarial and stochastic settings, and applicable to economic problems.

Contribution

It extends the bandits with knapsack model to include resource replenishment and provides a primal-dual algorithm with competitive guarantees and regret bounds.

Findings

01

Guarantees a constant competitive ratio under certain conditions.

02

Achieves an $ ilde{O}(T^{1/2})$ regret bound in stochastic models.

03

Applicable to practical economic decision problems.

Abstract

The bandits with knapsack (BwK) framework models online decision-making problems in which an agent makes a sequence of decisions subject to resource consumption constraints. The traditional model assumes that each action consumes a non-negative amount of resources and the process ends when the initial budgets are fully depleted. We study a natural generalization of the BwK framework which allows non-monotonic resource utilization, i.e., resources can be replenished by a positive amount. We propose a best-of-both-worlds primal-dual template that can handle any online learning problem with replenishment for which a suitable primal regret minimizer exists. In particular, we provide the first positive results for the case of adversarial inputs by showing that our framework guarantees a constant competitive ratio $α$ when $B = Ω (T)$ or when the possible per-round replenishment is a…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

1. This paper proposes and studies a novel and interesting problem. It is pretty smart to consider the case where the cost of an arm could be negative. 2. The results are rich and good. It studies multiple cases, and provides good results for them. The results look good enough from my understanding and knowledge.

Weaknesses

This paper has one major weakness, and one minor weakness about presenting. 1. It is not well discussed or stated how the negative cost could affect the problem. Is this new setup invalidate the existing algorithms, or it creates major challenges where we cannot simply use the existing algorithms or the regret analysis techniques? The algorithm looks pretty similar to the existing works, so as the regret bounds. From the current writing, I am not able to tell how hard the new setting is. 2. Th

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

- Well-motived model formulation. BwRK is an interesting extension of BwK, and the authors provide application examples as well (Sec. 6). - Algorithmic framework. The authors propose a primal-dual template (Alg. 1) that can be applied with various minimizers under different scenarios.

Weaknesses

- The theoretical results are not clearly discussed. For example, how tight are the theoretical results compared to lower bounds? - Lack of experiments. It would be interesting to know the empirical performance of the proposed algorithm. Especially compare the actual performance of this paper's algorithm with known ones in BwK when $\beta=0$. Are they exactly the same algorithms?

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

The problem setup is clearly motivated and introduced. The algorithm and analysis are thoroughly explained and neatly presented. Regret bounds improves on existing results.

Weaknesses

I believe there are some related work that could be beneficial to be added to the comparison. The line of work I would like to mention is the bandits with (soft) constraints. The authors argue that $\textit{in BwK problems constraints are required to be satisfied strictly at all rounds}$. However, when allowing negative $c_t$s, it is essentially allowing constraint violation in those rounds.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Auction Theory and Applications