The Mental World of Large Language Models in Recommendation: A Benchmark on Association, Personalization, and Knowledgeability
Guangneng Hu

TL;DR
This paper introduces LRWorld, a comprehensive benchmark to evaluate large language models in recommendation systems across association, personalization, and knowledgeability, revealing their strengths and limitations.
Contribution
It presents LRWorld, a new benchmark with extensive samples and measures, to systematically assess LLMs' capabilities and boundaries in recommendation tasks.
Findings
LLMs struggle with deep personalized embeddings.
They excel at shallow item-item similarity and entity relations.
Good at multimodal knowledge reasoning and noise robustness.
Abstract
Large language models (LLMs) have shown potential in recommendation systems (RecSys) by using them as either knowledge enhancer or zero-shot ranker. A key challenge lies in the large semantic gap between LLMs and RecSys where the former internalizes language world knowledge while the latter captures personalized world of behaviors. Unfortunately, the research community lacks a comprehensive benchmark that evaluates the LLMs over their limitations and boundaries in RecSys so that we can draw a confident conclusion. To investigate this, we propose a benchmark named LRWorld containing over 38K high-quality samples and 23M tokens carefully compiled and generated from widely used public recommendation datasets. LRWorld categorizes the mental world of LLMs in RecSys as three main scales (association, personalization, and knowledgeability) spanned by ten factors with 31 measures (tasks). Based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Explainable Artificial Intelligence (XAI) · Topic Modeling
