TL;DR
This study investigates whether large language models memorize recommendation datasets like MovieLens-1M, analyzing how memorization affects recommendation quality and varies across model types and sizes.
Contribution
It provides the first systematic analysis of dataset memorization in LLMs for recommendation tasks, highlighting its impact on performance and bias.
Findings
All models show some memorization of MovieLens-1M.
Memorization correlates with recommendation performance.
Memorization varies across model families and sizes.
Abstract
Large Language Models (LLMs) have become increasingly central to recommendation scenarios due to their remarkable natural language understanding and generation capabilities. Although significant research has explored the use of LLMs for various recommendation tasks, little effort has been dedicated to verifying whether they have memorized public recommendation dataset as part of their training data. This is undesirable because memorization reduces the generalizability of research findings, as benchmarking on memorized datasets does not guarantee generalization to unseen datasets. Furthermore, memorization can amplify biases, for example, some popular items may be recommended more frequently than others. In this work, we investigate whether LLMs have memorized public recommendation datasets. Specifically, we examine two model families (GPT and Llama) across multiple sizes, focusing on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
