Large Language Model-Enhanced Reinforcement Learning for Diverse and Novel Recommendations

Jiin Woo; Alireza Bagheri Garakani; Tianchen Zhou; Zhishen Huang; Yan Gao

arXiv:2507.21274·cs.LG·July 30, 2025

Large Language Model-Enhanced Reinforcement Learning for Diverse and Novel Recommendations

Jiin Woo, Alireza Bagheri Garakani, Tianchen Zhou, Zhishen Huang, Yan Gao

PDF

TL;DR

This paper introduces LAAC, a reinforcement learning approach that uses large language models as reference policies to enhance diversity and novelty in recommendations, balancing exploration with user relevance.

Contribution

The paper presents a novel bilevel optimization framework integrating LLMs into RL for recommendation diversity without extensive fine-tuning.

Findings

01

LAAC outperforms baselines in diversity, novelty, and accuracy.

02

The method is robust on imbalanced datasets.

03

It effectively leverages LLM knowledge without expensive fine-tuning.

Abstract

In recommendation systems, diversity and novelty are essential for capturing varied user preferences and encouraging exploration, yet many systems prioritize click relevance. While reinforcement learning (RL) has been explored to improve diversity, it often depends on random exploration that may not align with user interests. We propose LAAC (LLM-guided Adversarial Actor Critic), a novel method that leverages large language models (LLMs) as reference policies to suggest novel items, while training a lightweight policy to refine these suggestions using system-specific data. The method formulates training as a bilevel optimization between actor and critic networks, enabling the critic to selectively favor promising novel actions and the actor to improve its policy beyond LLM recommendations. To mitigate overestimation of unreliable LLM suggestions, we apply regularization that anchors…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.