RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs
Jiaxing Wu, Lin Ning, Luyang Liu, Harrison Lee, Neo Wu, Chao Wang,, Sushant Prakash, Shawn O'Banion, Bradley Green, Jun Xie

TL;DR
This paper introduces RLPF, a reinforcement learning method that fine-tunes LLMs to generate concise, informative user summaries optimized for downstream tasks, significantly improving personalization effectiveness and summary quality.
Contribution
RLPF is a novel reinforcement learning approach that enhances LLM-generated user summaries for personalization by optimizing for downstream task performance and summary quality.
Findings
Up to 22% improvement in downstream task performance
Achieved 84.59% win rate on summary quality metrics
Reduced context length by 74% while maintaining performance
Abstract
LLM-powered personalization agent systems employ Large Language Models (LLMs) to predict users' behavior from their past activities. However, their effectiveness often hinges on the ability to effectively leverage extensive, long user historical data due to its inherent noise and length of such data. Existing pretrained LLMs may generate summaries that are concise but lack the necessary context for downstream tasks, hindering their utility in personalization systems. To address these challenges, we introduce Reinforcement Learning from Prediction Feedback (RLPF). RLPF fine-tunes LLMs to generate concise, human-readable user summaries that are optimized for downstream task performance. By maximizing the usefulness of the generated summaries, RLPF effectively distills extensive user history data while preserving essential information for downstream tasks. Our empirical evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Data Quality and Management · Data Mining Algorithms and Applications
