LinkTransformer: A Unified Package for Record Linkage with Transformer Language Models
Abhishek Arora, Melissa Dell

TL;DR
LinkTransformer is an open-source package that simplifies record linkage using transformer language models, making advanced deep learning methods accessible and easy to integrate into existing workflows across multiple languages.
Contribution
It introduces a user-friendly, versatile package that extends traditional string matching with transformer models, supporting multi-language, customization, and easy integration from popular model hubs.
Findings
Provides a four-line code interface for transformer-based record linkage.
Includes a repository of pre-trained multilingual semantic similarity models.
Supports efficient model tuning and easy contribution of custom models.
Abstract
Linking information across sources is fundamental to a variety of analyses in social science, business, and government. While large language models (LLMs) offer enormous promise for improving record linkage in noisy datasets, in many domains approximate string matching packages in popular softwares such as R and Stata remain predominant. These packages have clean, simple interfaces and can be easily extended to a diversity of languages. Our open-source package LinkTransformer aims to extend the familiarity and ease-of-use of popular string matching methods to deep learning. It is a general purpose package for record linkage with transformer LLMs that treats record linkage as a text retrieval problem. At its core is an off-the-shelf toolkit for applying transformer models to record linkage with four lines of code. LinkTransformer contains a rich repository of pre-trained transformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗dell-research-harvard/lt-wikidata-comp-esmodel· 3 dl3 dl
- 🤗dell-research-harvard/lt-wikidata-comp-frmodel· 2 dl2 dl
- 🤗dell-research-harvard/lt-wikidata-comp-jamodel· 4 dl4 dl
- 🤗dell-research-harvard/lt-wikidata-comp-zhmodel· 20 dl· ♡ 120 dl♡ 1
- 🤗dell-research-harvard/lt-wikidata-comp-demodel· 8 dl8 dl
- 🤗dell-research-harvard/lt-wikidata-comp-enmodel· 411k dl· ♡ 2411k dl♡ 2
- 🤗dell-research-harvard/lt-mexicantrade4748model· 2 dl2 dl
- 🤗dell-research-harvard/lt-un-data-fine-fine-enmodel· 45 dl45 dl
- 🤗dell-research-harvard/lt-un-data-fine-fine-esmodel· 99 dl99 dl
- 🤗dell-research-harvard/lt-un-data-fine-fine-frmodel· 3 dl3 dl
Videos
Taxonomy
TopicsData Quality and Management · Topic Modeling
