Deep Transfer Learning for Multi-source Entity Linkage via Domain Adaptation
Di Jin, Bunyamin Sisman, Hao Wei, Xin Luna Dong, Danai Koutra

TL;DR
This paper introduces AdaMEL, a deep transfer learning framework for multi-source entity linkage that effectively leverages unlabeled data and domain adaptation to improve accuracy and stability over supervised methods.
Contribution
AdaMEL is a novel deep transfer learning approach that models attribute importance and uses domain adaptation to generalize entity linkage across diverse data sources.
Findings
Achieves 8.21% improvement over supervised methods.
More stable and faster in handling multiple data sources.
Effectively leverages unlabeled data for domain adaptation.
Abstract
Multi-source entity linkage focuses on integrating knowledge from multiple sources by linking the records that represent the same real world entity. This is critical in high-impact applications such as data cleaning and user stitching. The state-of-the-art entity linkage pipelines mainly depend on supervised learning that requires abundant amounts of training data. However, collecting well-labeled training data becomes expensive when the data from many sources arrives incrementally over time. Moreover, the trained models can easily overfit to specific data sources, and thus fail to generalize to new sources due to significant differences in data and label distributions. To address these challenges, we present AdaMEL, a deep transfer learning framework that learns generic high-level knowledge to perform multi-source entity linkage. AdaMEL models the attribute importance that is used to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Data-Driven Disease Surveillance
