Gradual Machine Learning for Entity Resolution
Boyi Hou, Qun Chen, Yanyan Wang, Youcef Nafa, Zhanhuai Li

TL;DR
This paper introduces a novel gradual machine learning paradigm for entity resolution that reduces the need for manual labeling by iteratively labeling easy and then more challenging instances, achieving competitive results.
Contribution
It proposes a new learning approach that automatically labels data in stages, minimizing manual effort and outperforming unsupervised methods while competing with supervised techniques.
Findings
Outperforms unsupervised methods in entity resolution
Achieves results comparable to supervised state-of-the-art
Reduces manual labeling effort significantly
Abstract
Usually considered as a classification problem, entity resolution (ER) can be very challenging on real data due to the prevalence of dirty values. The state-of-the-art solutions for ER were built on a variety of learning models (most notably deep neural networks), which require lots of accurately labeled training data. Unfortunately, high-quality labeled data usually require expensive manual work, and are therefore not readily available in many real scenarios. In this paper, we propose a novel learning paradigm for ER, called gradual machine learning, which aims to enable effective machine labeling without the requirement for manual labeling effort. It begins with some easy instances in a task, which can be automatically labeled by the machine with high accuracy, and then gradually labels more challenging instances by iterative factor graph inference. In gradual machine learning, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Anomaly Detection Techniques and Applications · Privacy-Preserving Technologies in Data
