Learning List-wise Representation in Reinforcement Learning for Ads Allocation with Multiple Auxiliary Tasks
Ze Wang, Guogang Liao, Xiaowen Shi, Xiaoxu Wu, Chuheng Zhang, Yongkang, Wang, Xingxing Wang, Dong Wang

TL;DR
This paper introduces a reinforcement learning approach for ads allocation that learns improved list-wise representations using auxiliary tasks, resulting in higher revenue and better generalization in recommendation systems.
Contribution
It proposes a novel RL method with auxiliary tasks for list-wise ads allocation, enhancing representation learning and sample efficiency.
Findings
Improved list-wise representations lead to higher platform revenue.
Auxiliary tasks significantly enhance RL agent performance.
Method outperforms state-of-the-art baselines in experiments.
Abstract
With the recent prevalence of reinforcement learning (RL), there have been tremendous interests in utilizing RL for ads allocation in recommendation platforms (e.g., e-commerce and news feed sites). To achieve better allocation, the input of recent RL-based ads allocation methods is upgraded from point-wise single item to list-wise item arrangement. However, this also results in a high-dimensional space of state-action pairs, making it difficult to learn list-wise representations with good generalization ability. This further hinders the exploration of RL agents and causes poor sample efficiency. To address this problem, we propose a novel RL-based approach for ads allocation which learns better list-wise representations by leveraging task-specific signals on Meituan food delivery platform. Specifically, we propose three different auxiliary tasks based on reconstruction, prediction, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Recommender Systems and Techniques · Advanced Bandit Algorithms Research
MethodsContrastive Learning
