On Leveraging Large Language Models for Enhancing Entity Resolution: A   Cost-efficient Approach

Huahang Li; Longyu Feng; Shuangyin Li; Fei Hao; Chen Jason Zhang,; Yuanfeng Song

arXiv:2401.03426·cs.CL·September 13, 2024·6 cites

On Leveraging Large Language Models for Enhancing Entity Resolution: A Cost-efficient Approach

Huahang Li, Longyu Feng, Shuangyin Li, Fei Hao, Chen Jason Zhang,, Yuanfeng Song

PDF

Open Access

TL;DR

This paper presents a cost-efficient framework leveraging Large Language Models for entity resolution, reducing API costs and improving accuracy through uncertainty reduction and strategic querying.

Contribution

It introduces an uncertainty reduction framework with algorithms for selective querying and error tolerance, enhancing LLM-based entity resolution at scale.

Findings

01

Significant cost reduction in LLM API usage.

02

Improved accuracy in entity resolution tasks.

03

Effective handling of LLM errors and dynamic partition adjustment.

Abstract

Entity resolution, the task of identifying and merging records that refer to the same real-world entity, is crucial in sectors like e-commerce, healthcare, and law enforcement. Large Language Models (LLMs) introduce an innovative approach to this task, capitalizing on their advanced linguistic capabilities and a ``pay-as-you-go'' model that provides significant advantages to those without extensive data science expertise. However, current LLMs are costly due to per-API request billing. Existing methods often either lack quality or become prohibitively expensive at scale. To address these problems, we propose an uncertainty reduction framework using LLMs to improve entity resolution results. We first initialize possible partitions of the entity cluster, refer to the same entity, and define the uncertainty of the result. Then, we reduce the uncertainty by selecting a few valuable matching…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Topic Modeling · Natural Language Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Adam · Layer Normalization · Residual Connection · Absolute Position Encodings · Dropout · Dense Connections