Permutation Equivariant Model-based Offline Reinforcement Learning for Auto-bidding
Zhiyu Mou, Miao Xu, Wei Chen, Rongquan Bai, Chuan Yu, Jian Xu

TL;DR
This paper introduces PE-MORL, a permutation equivariant model-based offline RL algorithm for auto-bidding that improves policy performance by better generalizing environment models and penalizing errors.
Contribution
It proposes a permutation equivariant architecture and a robust offline Q-learning method to enhance model-based offline RL for auto-bidding.
Findings
PE-MORL outperforms existing auto-bidding methods in real-world tests.
The permutation equivariant model improves environment generalization.
Pessimistic error penalization enhances policy reliability.
Abstract
Reinforcement learning (RL) for auto-bidding has shifted from using simplistic offline simulators (Simulation-based RL Bidding, SRLB) to offline RL on fixed real datasets (Offline RL Bidding, ORLB). However, ORLB policies are limited by the dataset's state space coverage, offering modest gains. While SRLB expands state coverage, its simulator-reality gap risks misleading policies. This paper introduces Model-based RL Bidding (MRLB), which learns an environment model from real data to bridge this gap. MRLB trains policies using both real and model-generated data, expanding state coverage beyond ORLB. To ensure model reliability, we propose: 1) A permutation equivariant model architecture for better generalization, and 2) A robust offline Q-learning method that pessimistically penalizes model errors. These form the Permutation Equivariant Model-based Offline RL (PE-MORL) algorithm.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
