Multi-task Offline Reinforcement Learning for Online Advertising in Recommender Systems

Langming Liu; Wanyu Wang; Chi Zhang; Bo Li; Hongzhi Yin; Xuetao Wei; Wenbo Su; Bo Zheng; Xiangyu Zhao

arXiv:2506.23090·cs.IR·July 10, 2025

Multi-task Offline Reinforcement Learning for Online Advertising in Recommender Systems

Langming Liu, Wanyu Wang, Chi Zhang, Bo Li, Hongzhi Yin, Xuetao Wei, Wenbo Su, Bo Zheng, Xiangyu Zhao

PDF

TL;DR

This paper introduces MTORL, a multi-task offline reinforcement learning framework tailored for online advertising, effectively handling sparse data, overestimation, and budget constraints through causal modeling and multi-task learning.

Contribution

The paper proposes a novel multi-task offline RL model with a causal state encoder and attention mechanisms, specifically designed for online advertising challenges.

Findings

01

MTORL outperforms existing methods in offline and online tests.

02

Causal attention improves user interest modeling.

03

Multi-task learning enhances recommendation and budget management.

Abstract

Online advertising in recommendation platforms has gained significant attention, with a predominant focus on channel recommendation and budget allocation strategies. However, current offline reinforcement learning (RL) methods face substantial challenges when applied to sparse advertising scenarios, primarily due to severe overestimation, distributional shifts, and overlooking budget constraints. To address these issues, we propose MTORL, a novel multi-task offline RL model that targets two key objectives. First, we establish a Markov Decision Process (MDP) framework specific to the nuances of advertising. Then, we develop a causal state encoder to capture dynamic user interests and temporal dependencies, facilitating offline RL through conditional sequence modeling. Causal attention mechanisms are introduced to enhance user sequence representations by identifying correlations among…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.