An Extremely Data-efficient and Generative LLM-based Reinforcement   Learning Agent for Recommenders

Shuang Feng; Grace Feng

arXiv:2408.16032·cs.LG·August 30, 2024

An Extremely Data-efficient and Generative LLM-based Reinforcement Learning Agent for Recommenders

Shuang Feng, Grace Feng

PDF

Open Access

TL;DR

This paper presents a highly data-efficient RL agent for recommender systems using LLMs, achieving competitive performance with minimal training data and time, by fine-tuning pre-trained models and employing preference-based training methods.

Contribution

It introduces a low-cost, data-efficient RL training approach for recommender systems using generative trajectories and preference optimization techniques with LLMs.

Findings

01

Generated trajectories match human data in task performance.

02

DPO agent achieved 19% success rate in under 30 minutes.

03

Limited training time sufficed for competitive results.

Abstract

Recent advancements in large language models (LLMs) have enabled understanding webpage contexts, product details, and human instructions. Utilizing LLMs as the foundational architecture for either reward models or policies in reinforcement learning has gained popularity -- a notable achievement is the success of InstructGPT. RL algorithms have been instrumental in maximizing long-term customer satisfaction and avoiding short-term, myopic goals in industrial recommender systems, which often rely on deep learning models to predict immediate clicks or purchases. In this project, several RL methods are implemented and evaluated using the WebShop benchmark environment, data, simulator, and pre-trained model checkpoints. The goal is to train an RL agent to maximize the purchase reward given a detailed human instruction describing a desired product. The RL agents are developed by fine-tuning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Direct Preference Optimization · Softmax · Linear Layer · Dropout · Adam · Layer Normalization · Weight Decay · Attention Is All You Need · Dense Connections