Continuum-armed Bandit Optimization with Batch Pairwise Comparison Oracles
Xiangyu Chang, Xi Chen, Yining Wang, Zhiyi Zeng

TL;DR
This paper introduces a novel bandit optimization approach using pairwise comparison oracles with biased estimates, applying it to pricing, inventory, and revenue management, and achieves near-optimal regret bounds.
Contribution
It develops a new algorithm combining discretization, local polynomial approximation, and a tournament elimination method for pairwise comparison bandits with biased feedback.
Findings
Achieves regret bounds close to the theoretical optimum.
Improves existing results in inventory and revenue management applications.
Introduces a new framework for biased pairwise comparison oracles.
Abstract
This paper studies a bandit optimization problem where the goal is to maximize a function over periods for some unknown strongly concave function . We consider a new pairwise comparison oracle, where the decision-maker chooses a pair of actions for a consecutive number of periods and then obtains an estimate of . We show that such a pairwise comparison oracle finds important applications to joint pricing and inventory replenishment problems and network revenue management. The challenge in this bandit optimization is twofold. First, the decision-maker not only needs to determine a pair of actions but also a stopping time (i.e., the number of queries based on ). Second, motivated by our inventory application, the estimate of the difference is biased, which is different from existing oracles in stochastic optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Optimization and Search Problems
