Q-Regularized Generative Auto-Bidding: From Suboptimal Trajectories to Optimal Policies

Mingming Zhang; Na Li; Zhuang Feiqing; Hongyang Zheng; Jiangbing Zhou; Wang Wuyin; Sheng-jie Sun; XiaoWei Chen; Junxiong Zhu; Lixin Zou; Chenliang Li

arXiv:2601.02754·cs.LG·February 4, 2026

Q-Regularized Generative Auto-Bidding: From Suboptimal Trajectories to Optimal Policies

Mingming Zhang, Na Li, Zhuang Feiqing, Hongyang Zheng, Jiangbing Zhou, Wang Wuyin, Sheng-jie Sun, XiaoWei Chen, Junxiong Zhu, Lixin Zou, Chenliang Li

PDF

Open Access

TL;DR

This paper introduces QGA, a novel auto-bidding method combining Q-value regularization with generative models, improving policy learning by leveraging experience and exploring beyond data limitations, leading to better advertising performance.

Contribution

QGA integrates Q-value regularization with Decision Transformer and introduces a dual-exploration mechanism, advancing auto-bidding by addressing suboptimal trajectories and enhancing policy optimization.

Findings

01

QGA outperforms existing methods on benchmarks and simulations.

02

In real-world A/B tests, QGA increases Ad GMV by 3.27%.

03

QGA improves Ad ROI by 2.49%.

Abstract

With the rapid development of e-commerce, auto-bidding has become a key asset in optimizing advertising performance under diverse advertiser environments. The current approaches focus on reinforcement learning (RL) and generative models. These efforts imitate offline historical behaviors by utilizing a complex structure with expensive hyperparameter tuning. The suboptimal trajectories further exacerbate the difficulty of policy learning. To address these challenges, we proposes QGA, a novel Q-value regularized Generative Auto-bidding method. In QGA, we propose to plug a Q-value regularization with double Q-learning strategy into the Decision Transformer backbone. This design enables joint optimization of policy imitation and action-value maximization, allowing the learned bidding policy to both leverage experience from the dataset and alleviate the adverse impact of the suboptimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis · Advanced Bandit Algorithms Research