Bandits with Movement Costs and Adaptive Pricing
Tomer Koren, Roi Livni, Yishay Mansour

TL;DR
This paper introduces a new algorithm for multi-armed bandits with movement costs modeled by a metric, achieving optimal regret rates and applying it to adaptive pricing with improved regret bounds.
Contribution
The paper develops a novel algorithm for bandits with movement costs modeled by a metric tree, achieving optimal regret rates and applying it to adaptive pricing problems.
Findings
Achieves regret of (\u007f\,rac{ ext{k}T}{k}) for bandits with movement costs.
Attains optimal (T^{2/3}) regret in Lipschitz bandit learning.
Improves adaptive pricing regret from (T^{3/4}) to (T^{2/3}).
Abstract
We extend the model of Multi-armed Bandit with unit switching cost to incorporate a metric between the actions. We consider the case where the metric over the actions can be modeled by a complete binary tree, and the distance between two leaves is the size of the subtree of their least common ancestor, which abstracts the case that the actions are points on the continuous interval and the switching cost is their distance. In this setting, we give a new algorithm that establishes a regret of , where is the number of actions and is the time horizon. When the set of actions corresponds to whole interval we can exploit our method for the task of bandit learning with Lipschitz loss functions, where our algorithm achieves an optimal regret rate of , which is the same rate one obtains when there is no penalty…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
