Learning Personalized Ad Impact via Contextual Reinforcement Learning under Delayed Rewards

Yuwei Cheng; Zifeng Zhao; Haifeng Xu

arXiv:2510.20055·cs.LG·October 24, 2025

Learning Personalized Ad Impact via Contextual Reinforcement Learning under Delayed Rewards

Yuwei Cheng, Zifeng Zhao, Haifeng Xu

PDF

Open Access

TL;DR

This paper develops a reinforcement learning framework for personalized ad bidding that accounts for delayed, long-term, and heterogeneous effects, providing near-optimal strategies validated through simulations.

Contribution

It introduces a novel CMDP model with delayed rewards for ad bidding and proposes an efficient estimation and RL algorithm with theoretical regret guarantees.

Findings

01

Achieves a near-optimal regret bound of (O(dH^2 oot T))

02

Validates the approach through simulation experiments

03

Addresses long-term, delayed, and heterogeneous ad impacts in bidding strategies

Abstract

Online advertising platforms use automated auctions to connect advertisers with potential customers, requiring effective bidding strategies to maximize profits. Accurate ad impact estimation requires considering three key factors: delayed and long-term effects, cumulative ad impacts such as reinforcement or fatigue, and customer heterogeneity. However, these effects are often not jointly addressed in previous studies. To capture these factors, we model ad bidding as a Contextual Markov Decision Process (CMDP) with delayed Poisson rewards. For efficient estimation, we propose a two-stage maximum likelihood estimator combined with data-splitting strategies, ensuring controlled estimation error based on the first-stage estimator's (in)accuracy. Building on this, we design a reinforcement learning algorithm to derive efficient personalized bidding strategies. This approach achieves a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConsumer Market Behavior and Pricing · Auction Theory and Applications · Advanced Bandit Algorithms Research