A Production-Ready RL Framework for Personalized Utility Tuning with Pareto Sweeping in Pinterest Recommender Systems

Yichu Zhou; Mehdi Ben Ayed; Lin Yang; Jiacong He; Andreanne Lemay; Jiaye Wang; Jaewon Yang; Josie Zeng; Dhruvil Deven Badani; Yijie Dylan Wang; Jiajing Xu; Charles Rosenberg

arXiv:2605.16344·cs.IR·May 19, 2026

A Production-Ready RL Framework for Personalized Utility Tuning with Pareto Sweeping in Pinterest Recommender Systems

Yichu Zhou, Mehdi Ben Ayed, Lin Yang, Jiacong He, Andreanne Lemay, Jiaye Wang, Jaewon Yang, Josie Zeng, Dhruvil Deven Badani, Yijie Dylan Wang, Jiajing Xu, Charles Rosenberg

PDF

TL;DR

PRL-PUTS is a production-ready reinforcement learning framework that optimizes multi-objective utility weights in recommender systems, enabling instant policy updates and improved user engagement.

Contribution

It introduces a ranker-independent RL approach with Pareto sweeping for real-time utility tuning in large-scale recommender systems.

Findings

01

Offline analysis shows unbiased exploration logs validate the approach.

02

Online experiments on Pinterest demonstrate a +0.13% increase in successful sessions.

03

The framework operates without adding serving latency.

Abstract

Large-scale recommenders encode multi-objective trade-offs by combining multiple predicted outcomes into a single utility score. Although this utility layer can be updated independently of the ranker, weight tuning remains largely manual, globally applied, slow to adapt to changing environments and business needs, and hard to govern as priorities shift. We propose PRL-PUTS, a Production-ready, ranker independent RL framework for Personalized Utility-weight Tuning with Pareto Sweeping. We cast utility tuning as a one-step, value-based RL problem: given request context, an agent selects a utility-weight vector that re-weights ranker predictions to maximize request-level engagement rewards. To visualize performance across the trade-off spectrum and allow decision makers to update the deployed operating policy instantly, we adopt an inference-time Pareto frontier sweeping via a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.