Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M

Dario Di Palma; Felice Antonio Merra; Maurizio Sfilio; Vito Walter Anelli; Fedelucio Narducci; Tommaso Di Noia

arXiv:2505.10212·cs.IR·May 16, 2025

Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M

Dario Di Palma, Felice Antonio Merra, Maurizio Sfilio, Vito Walter Anelli, Fedelucio Narducci, Tommaso Di Noia

PDF

1 Repo

TL;DR

This study investigates whether large language models memorize recommendation datasets like MovieLens-1M, analyzing how memorization affects recommendation quality and varies across model types and sizes.

Contribution

It provides the first systematic analysis of dataset memorization in LLMs for recommendation tasks, highlighting its impact on performance and bias.

Findings

01

All models show some memorization of MovieLens-1M.

02

Memorization correlates with recommendation performance.

03

Memorization varies across model families and sizes.

Abstract

Large Language Models (LLMs) have become increasingly central to recommendation scenarios due to their remarkable natural language understanding and generation capabilities. Although significant research has explored the use of LLMs for various recommendation tasks, little effort has been dedicated to verifying whether they have memorized public recommendation dataset as part of their training data. This is undesirable because memorization reduces the generalizability of research findings, as benchmarking on memorized datasets does not guarantee generalization to unseen datasets. Furthermore, memorization can amplify biases, for example, some popular items may be recommended more frequently than others. In this work, we investigate whether LLMs have memorized public recommendation datasets. Specifically, we examine two model families (GPT and Llama) across multiple sizes, focusing on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sisinflab/llm-memoryinspector
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.