Continuum-armed Bandit Optimization with Batch Pairwise Comparison Oracles

Xiangyu Chang; Xi Chen; Yining Wang; Zhiyi Zeng

arXiv:2505.22361·cs.LG·May 29, 2025

Continuum-armed Bandit Optimization with Batch Pairwise Comparison Oracles

Xiangyu Chang, Xi Chen, Yining Wang, Zhiyi Zeng

PDF

Open Access

TL;DR

This paper introduces a novel bandit optimization approach using pairwise comparison oracles with biased estimates, applying it to pricing, inventory, and revenue management, and achieves near-optimal regret bounds.

Contribution

It develops a new algorithm combining discretization, local polynomial approximation, and a tournament elimination method for pairwise comparison bandits with biased feedback.

Findings

01

Achieves regret bounds close to the theoretical optimum.

02

Improves existing results in inventory and revenue management applications.

03

Introduces a new framework for biased pairwise comparison oracles.

Abstract

This paper studies a bandit optimization problem where the goal is to maximize a function $f (x)$ over $T$ periods for some unknown strongly concave function $f$ . We consider a new pairwise comparison oracle, where the decision-maker chooses a pair of actions $(x, x^{'})$ for a consecutive number of periods and then obtains an estimate of $f (x) - f (x^{'})$ . We show that such a pairwise comparison oracle finds important applications to joint pricing and inventory replenishment problems and network revenue management. The challenge in this bandit optimization is twofold. First, the decision-maker not only needs to determine a pair of actions $(x, x^{'})$ but also a stopping time $n$ (i.e., the number of queries based on $(x, x^{'})$ ). Second, motivated by our inventory application, the estimate of the difference $f (x) - f (x^{'})$ is biased, which is different from existing oracles in stochastic optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Optimization and Search Problems