Multi-armed Bandit Requiring Monotone Arm Sequences
Ningyuan Chen

TL;DR
This paper studies a continuum-armed bandit problem with a monotonicity constraint on the arm sequence, showing that such constraints increase regret from the standard optimal rate, with specific bounds depending on the objective function.
Contribution
It introduces algorithms for monotone arm sequences in continuum-armed bandits and establishes regret bounds, highlighting the impact of monotonicity on learning efficiency.
Findings
Regret is $O(T)$ for Lipschitz continuous objectives.
Regret is $ ilde O(T^{3/4})$ for unimodal or quasiconcave objectives.
Monotonicity constraints increase regret compared to unconstrained bandits.
Abstract
In many online learning or multi-armed bandit problems, the taken actions or pulled arms are ordinal and required to be monotone over time. Examples include dynamic pricing, in which the firms use markup pricing policies to please early adopters and deter strategic waiting, and clinical trials, in which the dose allocation usually follows the dose escalation principle to prevent dose limiting toxicities. We consider the continuum-armed bandit problem when the arm sequence is required to be monotone. We show that when the unknown objective function is Lipschitz continuous, the regret is . When in addition the objective function is unimodal or quasiconcave, the regret is under the proposed algorithm, which is also shown to be the optimal rate. This deviates from the optimal rate in the continuous-armed bandit literature and demonstrates the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms
