Q-Regularized Generative Auto-Bidding: From Suboptimal Trajectories to Optimal Policies
Mingming Zhang, Na Li, Zhuang Feiqing, Hongyang Zheng, Jiangbing Zhou, Wang Wuyin, Sheng-jie Sun, XiaoWei Chen, Junxiong Zhu, Lixin Zou, Chenliang Li

TL;DR
This paper introduces QGA, a novel auto-bidding method combining Q-value regularization with generative models, improving policy learning by leveraging experience and exploring beyond data limitations, leading to better advertising performance.
Contribution
QGA integrates Q-value regularization with Decision Transformer and introduces a dual-exploration mechanism, advancing auto-bidding by addressing suboptimal trajectories and enhancing policy optimization.
Findings
QGA outperforms existing methods on benchmarks and simulations.
In real-world A/B tests, QGA increases Ad GMV by 3.27%.
QGA improves Ad ROI by 2.49%.
Abstract
With the rapid development of e-commerce, auto-bidding has become a key asset in optimizing advertising performance under diverse advertiser environments. The current approaches focus on reinforcement learning (RL) and generative models. These efforts imitate offline historical behaviors by utilizing a complex structure with expensive hyperparameter tuning. The suboptimal trajectories further exacerbate the difficulty of policy learning. To address these challenges, we proposes QGA, a novel Q-value regularized Generative Auto-bidding method. In QGA, we propose to plug a Q-value regularization with double Q-learning strategy into the Decision Transformer backbone. This design enables joint optimization of policy imitation and action-value maximization, allowing the learned bidding policy to both leverage experience from the dataset and alleviate the adverse impact of the suboptimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis · Advanced Bandit Algorithms Research
