On Leveraging Large Language Models for Enhancing Entity Resolution: A Cost-efficient Approach
Huahang Li, Longyu Feng, Shuangyin Li, Fei Hao, Chen Jason Zhang,, Yuanfeng Song

TL;DR
This paper presents a cost-efficient framework leveraging Large Language Models for entity resolution, reducing API costs and improving accuracy through uncertainty reduction and strategic querying.
Contribution
It introduces an uncertainty reduction framework with algorithms for selective querying and error tolerance, enhancing LLM-based entity resolution at scale.
Findings
Significant cost reduction in LLM API usage.
Improved accuracy in entity resolution tasks.
Effective handling of LLM errors and dynamic partition adjustment.
Abstract
Entity resolution, the task of identifying and merging records that refer to the same real-world entity, is crucial in sectors like e-commerce, healthcare, and law enforcement. Large Language Models (LLMs) introduce an innovative approach to this task, capitalizing on their advanced linguistic capabilities and a ``pay-as-you-go'' model that provides significant advantages to those without extensive data science expertise. However, current LLMs are costly due to per-API request billing. Existing methods often either lack quality or become prohibitively expensive at scale. To address these problems, we propose an uncertainty reduction framework using LLMs to improve entity resolution results. We first initialize possible partitions of the entity cluster, refer to the same entity, and define the uncertainty of the result. Then, we reduce the uncertainty by selecting a few valuable matching…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Natural Language Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Adam · Layer Normalization · Residual Connection · Absolute Position Encodings · Dropout · Dense Connections
