Harnessing Unimodality in Semiparametric Contextual Pricing via Oracle Price Map Learning
Yingying Fan, Yuxuan Han, Jinchi Lv, Xiaocong Xu, Zhengyuan Zhou

TL;DR
This paper introduces a novel policy for semiparametric contextual pricing that leverages unimodality and smoothness of the oracle price map, achieving near-optimal regret bounds.
Contribution
It develops the RBIT policy, a modular approach combining coarse-to-fine learning and bandit convex optimization, with adaptive exploration for linear utility models.
Findings
Achieves regret (rac{2eta-1}{4eta-3} + \u221a{dT}) in semiparametric pricing.
Establishes a minimax lower bound matching the nonparametric oracle map learning term.
Extends to high-dimensional sparse linear and nonparametric utility models.
Abstract
We study contextual dynamic pricing in a semiparametric scalar-index valuation model where the latent value is , with an unknown utility map and an unknown additive noise distribution. The key decision object is the one-dimensional oracle price map induced by the scalar index and the noise tail. Under the -H\"older smoothness of the tail function for and a revenue-geometry condition that gives a unique, stable, interior maximizer, this oracle map is itself -smooth. We exploit such structure through , a modular coarse-to-fine policy that takes a scalar pilot index as input, localizes a benchmark price in each active bin, and learns a local polynomial approximation of the oracle map inside a trust region via bandit convex optimization. For the baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
