Multi-armed Bandit Requiring Monotone Arm Sequences

Ningyuan Chen

arXiv:2106.03790·cs.LG·October 8, 2021·1 cites

Multi-armed Bandit Requiring Monotone Arm Sequences

Ningyuan Chen

PDF

Open Access 1 Video

TL;DR

This paper studies a continuum-armed bandit problem with a monotonicity constraint on the arm sequence, showing that such constraints increase regret from the standard optimal rate, with specific bounds depending on the objective function.

Contribution

It introduces algorithms for monotone arm sequences in continuum-armed bandits and establishes regret bounds, highlighting the impact of monotonicity on learning efficiency.

Findings

01

Regret is $O(T)$ for Lipschitz continuous objectives.

02

Regret is $ ilde O(T^{3/4})$ for unimodal or quasiconcave objectives.

03

Monotonicity constraints increase regret compared to unconstrained bandits.

Abstract

In many online learning or multi-armed bandit problems, the taken actions or pulled arms are ordinal and required to be monotone over time. Examples include dynamic pricing, in which the firms use markup pricing policies to please early adopters and deter strategic waiting, and clinical trials, in which the dose allocation usually follows the dose escalation principle to prevent dose limiting toxicities. We consider the continuum-armed bandit problem when the arm sequence is required to be monotone. We show that when the unknown objective function is Lipschitz continuous, the regret is $O (T)$ . When in addition the objective function is unimodal or quasiconcave, the regret is $\tilde{O} (T^{3/4})$ under the proposed algorithm, which is also shown to be the optimal rate. This deviates from the optimal rate $\tilde{O} (T^{2/3})$ in the continuous-armed bandit literature and demonstrates the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Multi-armed Bandit Requiring Monotone Arm Sequences· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms