Improving Machine-based Entity Resolution with Limited Human Effort: A   Risk Perspective

Zhaoqiang Chen; Qun Chen; Boyi Hou; Murtadha Ahmed; Zhanhuai Li

arXiv:1805.12502·cs.DB·August 15, 2018

Improving Machine-based Entity Resolution with Limited Human Effort: A Risk Perspective

Zhaoqiang Chen, Qun Chen, Boyi Hou, Murtadha Ahmed, Zhanhuai Li

PDF

TL;DR

This paper introduces a risk-based approach to improve machine-based entity resolution by selectively involving human verification on high-risk instances, leading to higher accuracy with limited human effort.

Contribution

It proposes a novel risk model that effectively identifies high-risk instances for manual review, enhancing entity resolution accuracy under limited human effort.

Findings

01

The risk model outperforms existing methods in identifying mislabeled instances.

02

It achieves better resolution quality than active learning approaches given the same human effort.

03

Experimental results on real data validate the effectiveness of the proposed approach.

Abstract

Pure machine-based solutions usually struggle in the challenging classification tasks such as entity resolution (ER). To alleviate this problem, a recent trend is to involve the human in the resolution process, most notably the crowdsourcing approach. However, it remains very challenging to effectively improve machine-based entity resolution with limited human effort. In this paper, we investigate the problem of human and machine cooperation for ER from a risk perspective. We propose to select the machine-labeled instances at high risk of being mislabeled for manual verification. For this task, we present a risk model that takes into consideration the human-labeled instances as well as the output of machine resolution. Finally, we evaluate the performance of the proposed risk model on real data. Our experiments demonstrate that it can pick up the mislabeled instances with considerably…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.