Multi-armed Bandits with Cost Subsidy
Deeksha Sinha, Karthik Abinav Sankararama, Abbas Kazerouni, Vashist, Avadhanula

TL;DR
This paper introduces a new variant of the multi-armed bandit problem that incorporates cost subsidies, addressing real-world scenarios where selecting options incurs costs and aims to optimize both rewards and expenses.
Contribution
It formulates the MAB with cost subsidy problem, establishes fundamental lower bounds, and proposes near-optimal algorithms with practical recommendations.
Findings
Naive extensions of classical algorithms perform poorly.
A fundamental lower bound on performance is established.
A simple explore-then-commit algorithm achieves near-optimal regret.
Abstract
In this paper, we consider a novel variant of the multi-armed bandit (MAB) problem, MAB with cost subsidy, which models many real-life applications where the learning agent has to pay to select an arm and is concerned about optimizing cumulative costs and rewards. We present two applications, intelligent SMS routing problem and ad audience optimization problem faced by several businesses (especially online platforms), and show how our problem uniquely captures key features of these applications. We show that naive generalizations of existing MAB algorithms like Upper Confidence Bound and Thompson Sampling do not perform well for this problem. We then establish a fundamental lower bound on the performance of any online learning algorithm for this problem, highlighting the hardness of our problem in comparison to the classical MAB problem. We also present a simple variant of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms
