$\epsilon$-Policy Gradient for Online Pricing

Lukasz Szpruch; Tanut Treetanthiploet; Yufei Zhang

arXiv:2405.03624·cs.LG·May 7, 2024

$\epsilon$-Policy Gradient for Online Pricing

Lukasz Szpruch, Tanut Treetanthiploet, Yufei Zhang

PDF

Open Access

TL;DR

This paper introduces an $$-policy gradient algorithm that combines model-based and model-free reinforcement learning for online pricing, achieving near-optimal regret bounds by balancing exploration and exploitation.

Contribution

It proposes a novel $$-policy gradient method that extends $$-greedy algorithms with gradient-based learning and analyzes its regret performance in online pricing.

Findings

01

Achieves expected regret of order $( ext{T})$ with logarithmic factors.

02

Balances exploration and exploitation effectively in online pricing.

03

Provides theoretical analysis of regret bounds for the proposed algorithm.

Abstract

Combining model-based and model-free reinforcement learning approaches, this paper proposes and analyzes an $ϵ$ -policy gradient algorithm for the online pricing learning task. The algorithm extends $ϵ$ -greedy algorithm by replacing greedy exploitation with gradient descent step and facilitates learning via model inference. We optimize the regret of the proposed algorithm by quantifying the exploration cost in terms of the exploration probability $ϵ$ and the exploitation cost in terms of the gradient descent optimization and gradient estimation errors. The algorithm achieves an expected regret of order $O (T)$ (up to a logarithmic factor) over $T$ trials.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuction Theory and Applications · Consumer Market Behavior and Pricing · Advanced Bandit Algorithms Research