DeepER -- Deep Entity Resolution
Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq Joty, Mourad, Ouzzani, Nan Tang

TL;DR
DeepER leverages deep learning and distributed representations to improve entity resolution accuracy, efficiency, and usability, significantly reducing human effort and feature engineering compared to traditional methods.
Contribution
This paper introduces DeepER, a novel ER system using RNNs with LSTM units and LSH-based blocking, enabling high accuracy and efficiency with minimal human involvement.
Findings
DeepER outperforms existing ER solutions on multiple datasets.
It achieves high accuracy with less human-labeled data.
The system significantly reduces feature engineering efforts.
Abstract
Entity resolution (ER) is a key data integration problem. Despite the efforts in 70+ years in all aspects of ER, there is still a high demand for democratizing ER - humans are heavily involved in labeling data, performing feature engineering, tuning parameters, and defining blocking functions. With the recent advances in deep learning, in particular distributed representation of words (a.k.a. word embeddings), we present a novel ER system, called DeepER, that achieves good accuracy, high efficiency, as well as ease-of-use (i.e., much less human efforts). For accuracy, we use sophisticated composition methods, namely uni- and bi-directional recurrent neural networks (RNNs) with long short term memory (LSTM) hidden units, to convert each tuple to a distributed representation (i.e., a vector), which can in turn be used to effectively capture similarities between tuples. We consider both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Machine Learning in Healthcare
