Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching
Tianshu Wang, Xiaoyang Chen, Hongyu Lin, Xuanang Chen, Xianpei Han,, Hao Wang, Zhenyu Zeng, Le Sun

TL;DR
This paper investigates various large language model-based entity matching strategies, compares their effectiveness, and proposes a combined framework (ComEM) that improves accuracy and efficiency across multiple datasets.
Contribution
It introduces a comprehensive comparison of matching, comparing, and selecting strategies for LLM-based entity matching and proposes a novel combined framework (ComEM) that leverages their strengths.
Findings
Selecting strategy outperforms others in effectiveness.
ComEM improves both accuracy and efficiency.
Experimental validation on multiple datasets confirms superiority.
Abstract
Entity matching (EM) is a critical step in entity resolution (ER). Recently, entity matching based on large language models (LLMs) has shown great promise. However, current LLM-based entity matching approaches typically follow a binary matching paradigm that ignores the global consistency among record relationships. In this paper, we investigate various methodologies for LLM-based entity matching that incorporate record interactions from different perspectives. Specifically, we comprehensively compare three representative strategies: matching, comparing, and selecting, and analyze their respective advantages and challenges in diverse scenarios. Based on our findings, we further design a compound entity matching framework (ComEM) that leverages the composition of multiple strategies and LLMs. ComEM benefits from the advantages of different sides and achieves improvements in both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Data Quality and Management · Natural Language Processing Techniques
