Towards Interpretable and Learnable Risk Analysis for Entity Resolution

Zhaoqiang Chen; Qun Chen; Boyi Hou; Tianyi Duan; Zhanhuai Li and; Guoliang Li

arXiv:1912.02947·cs.DB·December 9, 2019

Towards Interpretable and Learnable Risk Analysis for Entity Resolution

Zhaoqiang Chen, Qun Chen, Boyi Hou, Tianyi Duan, Zhanhuai Li and, Guoliang Li

PDF

Open Access

TL;DR

This paper introduces an interpretable, learnable framework for risk analysis in entity resolution, effectively identifying potentially mislabeled entity pairs with higher accuracy than existing methods.

Contribution

It proposes a novel risk analysis framework with automatic feature generation and a learnable model, addressing a gap in interpretability and accuracy in entity resolution.

Findings

01

The risk model outperforms existing methods in identifying mislabeled pairs.

02

Automatic generation of interpretable risk features enhances model transparency.

03

Empirical evaluation on real data demonstrates high accuracy of the proposed approach.

Abstract

Machine-learning-based entity resolution has been widely studied. However, some entity pairs may be mislabeled by machine learning models and existing studies do not study the risk analysis problem -- predicting and interpreting which entity pairs are mislabeled. In this paper, we propose an interpretable and learnable framework for risk analysis, which aims to rank the labeled pairs based on their risks of being mislabeled. We first describe how to automatically generate interpretable risk features, and then present a learnable risk model and its training technique. Finally, we empirically evaluate the performance of the proposed approach on real data. Our extensive experiments have shown that the learning risk model can identify the mislabeled pairs with considerably higher accuracy than the existing alternatives.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Topic Modeling · Data Mining Algorithms and Applications