# ZeroER: Entity Resolution using Zero Labeled Examples

**Authors:** Renzhi Wu, Sanya Chaba, Saurabh Sawlani, Xu Chu, Saravanan, Thirumuruganathan

arXiv: 1908.06049 · 2020-04-07

## TL;DR

ZeroER is a novel entity resolution method that effectively matches records without labeled data by leveraging similarity distributions, outperforming existing unsupervised methods and rivaling supervised approaches.

## Contribution

The paper introduces ZeroER, a zero-labeled example entity resolution algorithm using Gaussian Mixture Models, adaptive regularization, and transitivity integration for improved accuracy.

## Key findings

- ZeroER outperforms existing unsupervised ER methods.
- ZeroER achieves comparable results to supervised approaches.
- The approach is validated on five benchmark datasets.

## Abstract

Entity resolution (ER) refers to the problem of matching records in one or more relations that refer to the same real-world entity. While supervised machine learning (ML) approaches achieve the state-of-the-art results, they require a large amount of labeled examples that are expensive to obtain and often times infeasible. We investigate an important problem that vexes practitioners: is it possible to design an effective algorithm for ER that requires Zero labeled examples, yet can achieve performance comparable to supervised approaches? In this paper, we answer in the affirmative through our proposed approach dubbed ZeroER. Our approach is based on a simple observation -- the similarity vectors for matches should look different from that of unmatches. Operationalizing this insight requires a number of technical innovations. First, we propose a simple yet powerful generative model based on Gaussian Mixture Models for learning the match and unmatch distributions. Second, we propose an adaptive regularization technique customized for ER that ameliorates the issue of feature overfitting. Finally, we incorporate the transitivity property into the generative model in a novel way resulting in improved accuracy. On five benchmark ER datasets, we show that ZeroER greatly outperforms existing unsupervised approaches and achieves comparable performance to supervised approaches.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.06049/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/1908.06049/full.md

## References

58 references — full list in the complete paper: https://tomesphere.com/paper/1908.06049/full.md

---
Source: https://tomesphere.com/paper/1908.06049