Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy
Ishank Juneja, Carlee Joe-Wong, Osman Ya\u{g}an

TL;DR
This paper introduces the Cost-Ordered Feasibility (COF) algorithm for multi-armed bandits with cost constraints, providing theoretical bounds and empirical validation on real datasets.
Contribution
It develops a novel algorithm with instance-dependent bounds for cost-constrained bandits, extending prior theoretical results and demonstrating improved empirical performance.
Findings
COF achieves lower expected cumulative cost compared to baselines.
Theoretical bounds for sub-optimal samples and regret are established.
Empirical results on MovieLens and Goodreads datasets validate COF's effectiveness.
Abstract
The classic multi-armed bandit (MAB) problem tackles the challenge of accruing maximum reward while making decisions under uncertainty. However, in applications, often the goal is to minimize cost subject to a constraint on the minimum permissible reward, an objective captured by multi-armed bandits with cost-subsidy (MAB-CS). Of interest to this paper is the setting where the quality (reward) constraint is specified relative to the unknown best reward and the cost of each arm is known. We characterize the expected sub-optimal samples required by any policy by proving instance-dependent lower bounds that offer new insight into the problem and are a strict generalization of prior bounds. Then, we propose an algorithm called Cost-Ordered Feasibility (COF) that leverages our insight and intelligently combine samples from all arms to gauge the feasibility of a cheap arm. Thereafter, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
