UnifiedRL: A Reinforcement Learning Algorithm Tailored for Multi-Task Fusion in Large-Scale Recommender Systems

Peng Liu; Cong Xu; Ming Zhao; Jiawei Zhu; Bin Wang; Yi Ren

arXiv:2404.17589·cs.IR·September 25, 2025

UnifiedRL: A Reinforcement Learning Algorithm Tailored for Multi-Task Fusion in Large-Scale Recommender Systems

Peng Liu, Cong Xu, Ming Zhao, Jiawei Zhu, Bin Wang, Yi Ren

PDF

Open Access

TL;DR

UnifiedRL is a novel reinforcement learning algorithm designed specifically for multi-task fusion in large-scale recommender systems, addressing key limitations of existing methods and demonstrating significant performance improvements in real-world deployment.

Contribution

UnifiedRL introduces a seamless integration of offline RL with a custom exploration policy, enabling efficient online exploration and superior multi-task fusion in recommender systems.

Findings

01

Achieved +4.64% increase in user valid consumption

02

Achieved +1.74% increase in user duration time

03

Successfully deployed in multiple large-scale RSs since June 2023

Abstract

As the last pivotal stage of Recommender System (RS), Multi-Task Fusion (MTF) is responsible for combining multiple scores outputted by Multi-Task Learning (MTL) model into a final score to maximize user satisfaction. Recently, to optimize long-term user satisfaction, Reinforcement Learning (RL) is used for MTF in RSs. However, the existing offline RL algorithms used for MTF have the following severe problems: a) To avoid Out-of-Distribution (OOD), their constraints are overly strict, which seriously damage performance; b) They are unaware of the exploration policy used to collect training data, only suboptimal policy can be learned; c) Their exploration policies are inefficient and hurt user experience. To solve the above problems, we propose an innovative method called UnifiedRL tailored for MTF in large-scale RSs. UnifiedRL seamlessly integrates offline RL model with its custom…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Reinforcement Learning in Robotics