Cost-Efficient Prompt Engineering for Unsupervised Entity Resolution
Navapat Nananukul, Khanin Sisaengsuwanchai, Mayank Kejriwal

TL;DR
This paper systematically evaluates cost-effective prompt engineering methods for unsupervised Entity Resolution using LLMs like GPT-3.5, finding that simpler prompts often match or outperform more complex, expensive approaches.
Contribution
It provides the first comprehensive experimental analysis of prompt engineering techniques for unsupervised ER with LLMs, highlighting the effectiveness of simple prompts.
Findings
Simple prompts achieve high accuracy in ER tasks.
More complex prompts do not significantly outperform simpler ones.
LLMs like GPT-3.5 are viable for high-quality unsupervised ER.
Abstract
Entity Resolution (ER) is the problem of semi-automatically determining when two entities refer to the same underlying entity, with applications ranging from healthcare to e-commerce. Traditional ER solutions required considerable manual expertise, including domain-specific feature engineering, as well as identification and curation of training data. Recently released large language models (LLMs) provide an opportunity to make ER more seamless and domain-independent. However, it is also well known that LLMs can pose risks, and that the quality of their outputs can depend on how prompts are engineered. Unfortunately, a systematic experimental study on the effects of different prompting methods for addressing unsupervised ER, using LLMs like ChatGPT, has been lacking thus far. This paper aims to address this gap by conducting such a study. We consider some relatively simple and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Privacy-Preserving Technologies in Data
