A Farewell to Arms: Sequential Reward Maximization on a Budget with a Giving Up Option
P Sharoff, Nishant A. Mehta, Ravi Ganti

TL;DR
This paper introduces a new sequential decision-making model with a giving-up option, formulates it as a stochastic bandit problem with resource consumption, and proposes an algorithm with proven regret bounds.
Contribution
It formulates a novel bandit problem with giving-up options and develops WAIT-UCB, an algorithm with improved regret bounds over existing methods.
Findings
WAIT-UCB achieves logarithmic regret bounds.
Simulation results show WAIT-UCB outperforms state-of-the-art algorithms.
Optimal arm selection is based on reward-to-waiting-time ratio.
Abstract
We consider a sequential decision-making problem where an agent can take one action at a time and each action has a stochastic temporal extent, i.e., a new action cannot be taken until the previous one is finished. Upon completion, the chosen action yields a stochastic reward. The agent seeks to maximize its cumulative reward over a finite time budget, with the option of "giving up" on a current action -- hence forfeiting any reward -- in order to choose another action. We cast this problem as a variant of the stochastic multi-armed bandits problem with stochastic consumption of resource. For this problem, we first establish that the optimal arm is the one that maximizes the ratio of the expected reward of the arm to the expected waiting time before the agent sees the reward due to pulling that arm. Using a novel upper confidence bound on this ratio, we then introduce an upper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Consumer Market Behavior and Pricing
