Learning Personalized Ad Impact via Contextual Reinforcement Learning under Delayed Rewards
Yuwei Cheng, Zifeng Zhao, Haifeng Xu

TL;DR
This paper develops a reinforcement learning framework for personalized ad bidding that accounts for delayed, long-term, and heterogeneous effects, providing near-optimal strategies validated through simulations.
Contribution
It introduces a novel CMDP model with delayed rewards for ad bidding and proposes an efficient estimation and RL algorithm with theoretical regret guarantees.
Findings
Achieves a near-optimal regret bound of (O(dH^2 oot T))
Validates the approach through simulation experiments
Addresses long-term, delayed, and heterogeneous ad impacts in bidding strategies
Abstract
Online advertising platforms use automated auctions to connect advertisers with potential customers, requiring effective bidding strategies to maximize profits. Accurate ad impact estimation requires considering three key factors: delayed and long-term effects, cumulative ad impacts such as reinforcement or fatigue, and customer heterogeneity. However, these effects are often not jointly addressed in previous studies. To capture these factors, we model ad bidding as a Contextual Markov Decision Process (CMDP) with delayed Poisson rewards. For efficient estimation, we propose a two-stage maximum likelihood estimator combined with data-splitting strategies, ensuring controlled estimation error based on the first-stage estimator's (in)accuracy. Building on this, we design a reinforcement learning algorithm to derive efficient personalized bidding strategies. This approach achieves a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConsumer Market Behavior and Pricing · Auction Theory and Applications · Advanced Bandit Algorithms Research
