DAME: Domain Adaptation for Matching Entities

Mohamed Trabelsi; Jeff Heflin; Jin Cao

arXiv:2204.09244·cs.LG·April 21, 2022

DAME: Domain Adaptation for Matching Entities

Mohamed Trabelsi, Jeff Heflin, Jin Cao

PDF

Open Access 1 Repo

TL;DR

This paper introduces a domain adaptation approach for entity matching that leverages multiple source domains to improve performance on unseen target domains, especially in zero-shot and few-shot scenarios.

Contribution

It proposes a novel domain adaptation framework for entity matching that transfers knowledge from multiple source datasets to enhance generalization on new domains.

Findings

01

The method effectively transfers knowledge in zero-shot settings.

02

Fine-tuning improves performance over state-of-the-art methods.

03

The approach reduces overfitting to individual datasets.

Abstract

Entity matching (EM) identifies data records that refer to the same real-world entity. Despite the effort in the past years to improve the performance in EM, the existing methods still require a huge amount of labeled data in each domain during the training phase. These methods treat each domain individually, and capture the specific signals for each dataset in EM, and this leads to overfitting on just one dataset. The knowledge that is learned from one dataset is not utilized to better understand the EM task in order to make predictions on the unseen datasets with fewer labeled samples. In this paper, we propose a new domain adaptation-based method that transfers the task knowledge from multiple source domains to a target domain. Our method presents a new setting for EM where the objective is to capture the task-specific knowledge from pretraining our model using multiple source…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

medtray/dame
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Topic Modeling · Artificial Intelligence in Healthcare