Leveraging large language models for efficient representation learning   for entity resolution

Xiaowei Xu; Bi T. Foua; Xingqiao Wang; Vivek Gunasekaran; John R.; Talburt

arXiv:2411.10629·cs.CL·November 19, 2024

Leveraging large language models for efficient representation learning for entity resolution

Xiaowei Xu, Bi T. Foua, Xingqiao Wang, Vivek Gunasekaran, John R., Talburt

PDF

Open Access

TL;DR

This paper introduces TriBERTa, a novel entity resolution system that leverages large language models and triplet loss to learn robust representations, significantly improving matching accuracy over existing methods across multiple datasets.

Contribution

The paper presents TriBERTa, a new supervised approach combining pre-trained LLMs and contrastive learning for enhanced entity resolution performance.

Findings

01

Outperforms SBERT and TF-IDF by 3-19% in accuracy

02

Produces more robust representations across datasets

03

Demonstrates the effectiveness of triplet loss in entity matching

Abstract

In this paper, the authors propose TriBERTa, a supervised entity resolution system that utilizes a pre-trained large language model and a triplet loss function to learn representations for entity matching. The system consists of two steps: first, name entity records are fed into a Sentence Bidirectional Encoder Representations from Transformers (SBERT) model to generate vector representations, which are then fine-tuned using contrastive learning based on a triplet loss function. Fine-tuned representations are used as input for entity matching tasks, and the results show that the proposed approach outperforms state-of-the-art representations, including SBERT without fine-tuning and conventional Term Frequency-Inverse Document Frequency (TF-IDF), by a margin of 3 - 19%. Additionally, the representations generated by TriBERTa demonstrated increased robustness, maintaining consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Topic Modeling · Machine Learning in Healthcare

MethodsContrastive Learning · Triplet Loss · Sentence-BERT