Bandits with Anytime Knapsacks
Eray Can Elumar, Cem Tekin, Osman Yagan

TL;DR
This paper introduces a new bandit problem with anytime constraints, proposes an adaptive algorithm called SUAK that balances exploration and exploitation while respecting ongoing cost constraints, and demonstrates its effectiveness through theoretical bounds and simulations.
Contribution
It formulates the bandits with anytime knapsacks problem, proposes the SUAK algorithm, and proves it achieves optimal regret bounds similar to the classic BwK setting.
Findings
SUAK achieves $O(K \, \log T)$ regret bound.
SUAK effectively manages the anytime cost constraint in practice.
Simulations confirm SUAK's practical utility.
Abstract
We consider bandits with anytime knapsacks (BwAK), a novel version of the BwK problem where there is an \textit{anytime} cost constraint instead of a total cost budget. This problem setting introduces additional complexities as it mandates adherence to the constraint throughout the decision-making process. We propose SUAK, an algorithm that utilizes upper confidence bounds to identify the optimal mixture of arms while maintaining a balance between exploration and exploitation. SUAK is an adaptive algorithm that strategically utilizes the available budget in each round in the decision-making process and skips a round when it is possible to violate the anytime cost constraint. In particular, SUAK slightly under-utilizes the available cost budget to reduce the need for skipping rounds. We show that SUAK attains the same problem-dependent regret upper bound of established in…
Peer Reviews
Decision·Submitted to ICLR 2025
Bandits with knapsack is a relevant topic for ICLR. The paper is fairly well written, and the authors made an effort to address the obvious questions concerning their model.
Although somewhat natural, the idea of studying anytime constraints is pretty incremental with respect to previous work. I am not saying that the problem is immediately solvable by algorithms in the literature, but the algorithmic approach, i.e., “skipping rounds where the constraints may be violated + underspend a bit to minimize skips” is somewhat natural. The authors did address the natural question regarding standard algorithms complemented with a skipping strategy (see Section 3.1.), but
The paper is clearly written and easy to digest. The bandit with knapsack framework is interesting and this paper proposes a new model.
My main issue is understanding the connection with prior literature. I strongly believe that there are easy reductions from existing works (admittedly very recent) that can obtain the same results as your algorithm. For example, take any algorithm that satisfies the constraints in high probability apart from O(sqrt(T)) violation (at any time!). Why can't you instantiate an instance of your problem with B-sqrt(T) initial budget and use any of these algorithms? Also, I'm not convinced that the sk
- The paper is relatively easy to follow. Writing is mostly clear. - The model considered is simple yet interesting and important. It has not been well studied in previous work. - The analysis is largely sound and rigorous.
- Algorithm design components and novelty need further investigation. - Assumptions might be too strong and more results might be needed to gain deeper understanding. - Experiment details need clarification. - Proof requires great polishing. See Questions for details.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Spreadsheets and End-User Computing
