In-context Clustering-based Entity Resolution with Large Language Models: A Design Space Exploration
Jiajie Fu, Haitong Tang, Arijit Khan, Sharad Mehrotra, Xiangyu Ke, Yunjun Gao

TL;DR
This paper introduces a novel in-context clustering method using Large Language Models for Entity Resolution, significantly improving accuracy and efficiency while reducing costs compared to traditional pairwise approaches.
Contribution
It proposes LLM-CER, a scalable clustering-based ER approach that explores the design space and addresses challenges like cluster merging and hallucination, advancing LLM applications in ER.
Findings
Up to 150% higher accuracy in ER tasks.
10% increase in F-measure over baselines.
API calls reduced by up to 5 times.
Abstract
Entity Resolution (ER) is a fundamental data quality improvement task that identifies and links records referring to the same real-world entity. Traditional ER approaches often rely on pairwise comparisons, which can be costly in terms of time and monetary resources, especially with large datasets. Recently, Large Language Models (LLMs) have shown promising results in ER tasks. However, existing methods typically focus on pairwise matching, missing the potential of LLMs to perform clustering directly in a more cost-effective and scalable manner. In this paper, we propose a novel in-context clustering approach for ER, where LLMs are used to cluster records directly, reducing both time complexity and monetary costs. We systematically investigate the design space for in-context clustering, analyzing the impact of factors such as set size, diversity, variation, and ordering of records on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Service-Oriented Architecture and Web Services · Web Data Mining and Analysis
