
TL;DR
This paper demonstrates how leveraging Large Language Models and Generative AI significantly enhances the accuracy of detecting and repairing duplicate customer records in CRMs, doubling the success rate compared to traditional NLP methods.
Contribution
The paper introduces a novel approach using GenAI for duplicate detection in CRMs, achieving nearly double the accuracy of existing NLP-based techniques.
Findings
De-duplication accuracy improved from 30% to 60%.
GenAI approach outperforms traditional NLP methods.
Benchmark datasets validate the effectiveness of the proposed method.
Abstract
Customer data is often stored as records in Customer Relations Management systems (CRMs). Data which is manually entered into such systems by one of more users over time leads to data replication, partial duplication or fuzzy duplication. This in turn means that there no longer a single source of truth for customers, contacts, accounts, etc. Downstream business processes become increasing complex and contrived without a unique mapping between a record in a CRM and the target customer. Current methods to detect and de-duplicate records use traditional Natural Language Processing techniques known as Entity Matching. In this paper we show how using the latest advancements in Large Language Models and Generative AI can vastly improve the identification and repair of duplicated records. On common benchmark datasets we find an improvement in the accuracy of data de-duplication rates from 30…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Digital Media Forensic Detection · Machine Learning and Data Classification
