Session-Level Dynamic Ad Load Optimization using Offline Robust Reinforcement Learning
Tao Liu, Qi Xu, Wei Shi, Zhigang Hua, Shuang Yang

TL;DR
This paper presents an offline robust reinforcement learning framework for session-level ad load optimization that effectively balances user experience and monetization, demonstrating significant offline and online performance improvements.
Contribution
The paper introduces a novel offline robust dueling DQN approach that mitigates confounding bias and enhances stability against distribution shifts in ad load optimization.
Findings
Over 80% offline gains over causal learning baselines
Additional 5% offline gains with robustness enhancements
Double-digit improvements in online engagement-ad score trade-off
Abstract
Session-level dynamic ad load optimization aims to personalize the density and types of delivered advertisements in real time during a user's online session by dynamically balancing user experience quality and ad monetization. Traditional causal learning-based approaches struggle with key technical challenges, especially in handling confounding bias and distribution shifts. In this paper, we develop an offline deep Q-network (DQN)-based framework that effectively mitigates confounding bias in dynamic systems and demonstrates more than 80% offline gains compared to the best causal learning-based production baseline. Moreover, to improve the framework's robustness against unanticipated distribution shifts, we further enhance our framework with a novel offline robust dueling DQN approach. This approach achieves more stable rewards on multiple OpenAI-Gym datasets as perturbations increase,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIterative Learning Control Systems · VLSI and FPGA Design Techniques · Scheduling and Optimization Algorithms
MethodsQ-Learning · Convolution · Dense Connections · Deep Q-Network
