A Farewell to Arms: Sequential Reward Maximization on a Budget with a   Giving Up Option

P Sharoff; Nishant A. Mehta; Ravi Ganti

arXiv:2003.03456·cs.LG·March 26, 2020·1 cites

A Farewell to Arms: Sequential Reward Maximization on a Budget with a Giving Up Option

P Sharoff, Nishant A. Mehta, Ravi Ganti

PDF

Open Access

TL;DR

This paper introduces a new sequential decision-making model with a giving-up option, formulates it as a stochastic bandit problem with resource consumption, and proposes an algorithm with proven regret bounds.

Contribution

It formulates a novel bandit problem with giving-up options and develops WAIT-UCB, an algorithm with improved regret bounds over existing methods.

Findings

01

WAIT-UCB achieves logarithmic regret bounds.

02

Simulation results show WAIT-UCB outperforms state-of-the-art algorithms.

03

Optimal arm selection is based on reward-to-waiting-time ratio.

Abstract

We consider a sequential decision-making problem where an agent can take one action at a time and each action has a stochastic temporal extent, i.e., a new action cannot be taken until the previous one is finished. Upon completion, the chosen action yields a stochastic reward. The agent seeks to maximize its cumulative reward over a finite time budget, with the option of "giving up" on a current action -- hence forfeiting any reward -- in order to choose another action. We cast this problem as a variant of the stochastic multi-armed bandits problem with stochastic consumption of resource. For this problem, we first establish that the optimal arm is the one that maximizes the ratio of the expected reward of the arm to the expected waiting time before the agent sees the reward due to pulling that arm. Using a novel upper confidence bound on this ratio, we then introduce an upper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Consumer Market Behavior and Pricing