Sustainable Online Reinforcement Learning for Auto-bidding
Zhiyu Mou, Yusen Huo, Rongquan Bai, Mingzhou Xie, Chuan Yu, Jian Xu,, Bo Zheng

TL;DR
This paper introduces a sustainable online reinforcement learning framework for auto-bidding in advertising, directly interacting with real systems to improve safety and effectiveness, addressing the gap between virtual and real environments.
Contribution
It proposes a novel online RL framework with a safe exploration policy and variance-suppressed conservative Q-learning, overcoming the offline-online discrepancy in auto-bidding.
Findings
Outperforms state-of-the-art auto-bidding algorithms in simulations
Demonstrates effectiveness in real-world advertising systems
Provides theoretical safety guarantees for the exploration policy
Abstract
Recently, auto-bidding technique has become an essential tool to increase the revenue of advertisers. Facing the complex and ever-changing bidding environments in the real-world advertising system (RAS), state-of-the-art auto-bidding policies usually leverage reinforcement learning (RL) algorithms to generate real-time bids on behalf of the advertisers. Due to safety concerns, it was believed that the RL training process can only be carried out in an offline virtual advertising system (VAS) that is built based on the historical data generated in the RAS. In this paper, we argue that there exists significant gaps between the VAS and RAS, making the RL training process suffer from the problem of inconsistency between online and offline (IBOO). Firstly, we formally define the IBOO and systematically analyze its causes and influences. Then, to avoid the IBOO, we propose a sustainable online…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAuction Theory and Applications · FinTech, Crowdfunding, Digital Finance · Consumer Market Behavior and Pricing
MethodsQ-Learning
