Deep Transfer Learning for Multi-source Entity Linkage via Domain   Adaptation

Di Jin; Bunyamin Sisman; Hao Wei; Xin Luna Dong; Danai Koutra

arXiv:2110.14509·cs.LG·October 28, 2021

Deep Transfer Learning for Multi-source Entity Linkage via Domain Adaptation

Di Jin, Bunyamin Sisman, Hao Wei, Xin Luna Dong, Danai Koutra

PDF

Open Access 1 Repo

TL;DR

This paper introduces AdaMEL, a deep transfer learning framework for multi-source entity linkage that effectively leverages unlabeled data and domain adaptation to improve accuracy and stability over supervised methods.

Contribution

AdaMEL is a novel deep transfer learning approach that models attribute importance and uses domain adaptation to generalize entity linkage across diverse data sources.

Findings

01

Achieves 8.21% improvement over supervised methods.

02

More stable and faster in handling multiple data sources.

03

Effectively leverages unlabeled data for domain adaptation.

Abstract

Multi-source entity linkage focuses on integrating knowledge from multiple sources by linking the records that represent the same real world entity. This is critical in high-impact applications such as data cleaning and user stitching. The state-of-the-art entity linkage pipelines mainly depend on supervised learning that requires abundant amounts of training data. However, collecting well-labeled training data becomes expensive when the data from many sources arrives incrementally over time. Moreover, the trained models can easily overfit to specific data sources, and thus fail to generalize to new sources due to significant differences in data and label distributions. To address these challenges, we present AdaMEL, a deep transfer learning framework that learns generic high-level knowledge to perform multi-source entity linkage. AdaMEL models the attribute importance that is used to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

derekdijin/adamel-supplementary
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Topic Modeling · Data-Driven Disease Surveillance