Evaluation on Entity Matching in Recommender Systems
Zihan Huang, Rohan Surana, Zhouhang Xie, Junda Wu, Yu Xia, Julian McAuley

TL;DR
This paper introduces a new dataset for cross-dataset entity matching in recommender systems, evaluates various matching methods, and provides a gold standard to facilitate future research in this area.
Contribution
It presents Reddit-Amazon-EM, a manually annotated dataset for entity matching, and offers a comprehensive evaluation of multiple matching techniques including LLM-based approaches.
Findings
LLM-based methods perform competitively in entity matching.
The dataset enables reproducible evaluation of matching algorithms.
The best method achieves high accuracy in cross-dataset entity alignment.
Abstract
Entity matching is a crucial component in various recommender systems, including conversational recommender systems (CRS) and knowledge-based recommender systems. However, the lack of rigorous evaluation frameworks for cross-dataset entity matching impedes progress in areas such as LLM-driven conversational recommendations and knowledge-grounded dataset construction. In this paper, we introduce Reddit-Amazon-EM, a novel dataset comprising naturally occurring items from Reddit and the Amazon '23 dataset. Through careful manual annotation, we identify corresponding movies across Reddit-Movies and Amazon'23, two existing recommender system datasets with inherently overlapping catalogs. Leveraging Reddit-Amazon-EM, we conduct a comprehensive evaluation of state-of-the-art entity matching methods, including rule-based, graph-based, lexical-based, embedding-based, and LLM-based approaches.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Recommender Systems and Techniques · Machine Learning in Healthcare
