Policy Optimization Using Semi-parametric Models for Dynamic Pricing
Jianqing Fan, Yongyi Guo, Mengxin Yu

TL;DR
This paper develops a semi-parametric model for dynamic pricing, combining statistical learning with online decision-making, achieving near-optimal regret bounds in revenue maximization under market noise.
Contribution
It introduces a novel semi-parametric approach for dynamic pricing that learns both parametric and nonparametric demand components simultaneously.
Findings
Achieves regret bounds close to the theoretical lower limit for market noise distributions.
Extends the model to handle dynamically dependent features under strong mixing conditions.
Provides a policy that adapts to different noise smoothness levels, improving revenue.
Abstract
In this paper, we study the contextual dynamic pricing problem where the market value of a product is linear in its observed features plus some market noise. Products are sold one at a time, and only a binary response indicating success or failure of a sale is observed. Our model setting is similar to Javanmard and Nazerzadeh [2019] except that we expand the demand curve to a semiparametric model and need to learn dynamically both parametric and nonparametric components. We propose a dynamic statistical learning and decision-making policy that combines semiparametric estimation from a generalized linear model with an unknown link and online decision-making to minimize regret (maximize revenue). Under mild conditions, we show that for a market noise c.d.f. with -th order derivative (), our policy achieves a regret upper bound of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
