Cost-aware Cascading Bandits
Ruida Zhou, Chao Gan, Jing Yan, Cong Shen

TL;DR
This paper introduces a cost-aware cascading bandits model that optimizes the sequence and stopping point of item examination to maximize net reward, considering random costs, with proven optimal policies and regret bounds.
Contribution
It proposes a novel cost-aware cascading bandits framework, deriving optimal offline policies and an online algorithm with logarithmic regret bounds.
Findings
UCR-T1 policy is optimal in offline setting.
CC-UCB algorithm achieves O(log T) regret in online setting.
Experimental results validate the effectiveness of the proposed methods.
Abstract
In this paper, we propose a cost-aware cascading bandits model, a new variant of multi-armed ban- dits with cascading feedback, by considering the random cost of pulling arms. In each step, the learning agent chooses an ordered list of items and examines them sequentially, until certain stopping condition is satisfied. Our objective is then to max- imize the expected net reward in each step, i.e., the reward obtained in each step minus the total cost in- curred in examining the items, by deciding the or- dered list of items, as well as when to stop examina- tion. We study both the offline and online settings, depending on whether the state and cost statistics of the items are known beforehand. For the of- fline setting, we show that the Unit Cost Ranking with Threshold 1 (UCR-T1) policy is optimal. For the online setting, we propose a Cost-aware Cas- cading Upper Confidence Bound…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms
