Permutation Equivariant Model-based Offline Reinforcement Learning for Auto-bidding

Zhiyu Mou; Miao Xu; Wei Chen; Rongquan Bai; Chuan Yu; Jian Xu

arXiv:2506.17919·cs.LG·June 24, 2025

Permutation Equivariant Model-based Offline Reinforcement Learning for Auto-bidding

Zhiyu Mou, Miao Xu, Wei Chen, Rongquan Bai, Chuan Yu, Jian Xu

PDF

TL;DR

This paper introduces PE-MORL, a permutation equivariant model-based offline RL algorithm for auto-bidding that improves policy performance by better generalizing environment models and penalizing errors.

Contribution

It proposes a permutation equivariant architecture and a robust offline Q-learning method to enhance model-based offline RL for auto-bidding.

Findings

01

PE-MORL outperforms existing auto-bidding methods in real-world tests.

02

The permutation equivariant model improves environment generalization.

03

Pessimistic error penalization enhances policy reliability.

Abstract

Reinforcement learning (RL) for auto-bidding has shifted from using simplistic offline simulators (Simulation-based RL Bidding, SRLB) to offline RL on fixed real datasets (Offline RL Bidding, ORLB). However, ORLB policies are limited by the dataset's state space coverage, offering modest gains. While SRLB expands state coverage, its simulator-reality gap risks misleading policies. This paper introduces Model-based RL Bidding (MRLB), which learns an environment model from real data to bridge this gap. MRLB trains policies using both real and model-generated data, expanding state coverage beyond ORLB. To ensure model reliability, we propose: 1) A permutation equivariant model architecture for better generalization, and 2) A robust offline Q-learning method that pessimistically penalizes model errors. These form the Permutation Equivariant Model-based Offline RL (PE-MORL) algorithm.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.