Reuse and Adaptation for Entity Resolution through Transfer Learning
Saravanan Thirumuruganathan, Shameem A Puthiya Parambath, Mourad, Ouzzani, Nan Tang, Shafiq Joty

TL;DR
This paper explores transfer learning for entity resolution, enabling classifiers trained on one dataset to be adapted for use on related datasets with limited or no training data, reducing manual effort.
Contribution
It introduces a distributed representation approach and five algorithms for effective reuse and adaptation of training data across related datasets in entity resolution.
Findings
Algorithms outperform baseline methods in diverse datasets
Significant performance improvements with limited training data
Effective transfer learning reduces manual feature engineering
Abstract
Entity resolution (ER) is one of the fundamental problems in data integration, where machine learning (ML) based classifiers often provide the state-of-the-art results. Considerable human effort goes into feature engineering and training data creation. In this paper, we investigate a new problem: Given a dataset D_T for ER with limited or no training data, is it possible to train a good ML classifier on D_T by reusing and adapting the training data of dataset D_S from same or related domain? Our major contributions include (1) a distributed representation based approach to encode each tuple from diverse datasets into a standard feature space; (2) identification of common scenarios where the reuse of training data can be beneficial; and (3) five algorithms for handling each of the aforementioned scenarios. We have performed comprehensive experiments on 12 datasets from 5 different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Anomaly Detection Techniques and Applications · Topic Modeling
