RLPF: Reinforcement Learning from Prediction Feedback for User   Summarization with LLMs

Jiaxing Wu; Lin Ning; Luyang Liu; Harrison Lee; Neo Wu; Chao Wang,; Sushant Prakash; Shawn O'Banion; Bradley Green; Jun Xie

arXiv:2409.04421·cs.CL·January 20, 2025

RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs

Jiaxing Wu, Lin Ning, Luyang Liu, Harrison Lee, Neo Wu, Chao Wang,, Sushant Prakash, Shawn O'Banion, Bradley Green, Jun Xie

PDF

Open Access

TL;DR

This paper introduces RLPF, a reinforcement learning method that fine-tunes LLMs to generate concise, informative user summaries optimized for downstream tasks, significantly improving personalization effectiveness and summary quality.

Contribution

RLPF is a novel reinforcement learning approach that enhances LLM-generated user summaries for personalization by optimizing for downstream task performance and summary quality.

Findings

01

Up to 22% improvement in downstream task performance

02

Achieved 84.59% win rate on summary quality metrics

03

Reduced context length by 74% while maintaining performance

Abstract

LLM-powered personalization agent systems employ Large Language Models (LLMs) to predict users' behavior from their past activities. However, their effectiveness often hinges on the ability to effectively leverage extensive, long user historical data due to its inherent noise and length of such data. Existing pretrained LLMs may generate summaries that are concise but lack the necessary context for downstream tasks, hindering their utility in personalization systems. To address these challenges, we introduce Reinforcement Learning from Prediction Feedback (RLPF). RLPF fine-tunes LLMs to generate concise, human-readable user summaries that are optimized for downstream task performance. By maximizing the usefulness of the generated summaries, RLPF effectively distills extensive user history data while preserving essential information for downstream tasks. Our empirical evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Data Quality and Management · Data Mining Algorithms and Applications