Deep Indexed Active Learning for Matching Heterogeneous Entity   Representations

Arjit Jain; Sunita Sarawagi; Prithviraj Sen

arXiv:2104.03986·cs.DB·January 19, 2022

Deep Indexed Active Learning for Matching Heterogeneous Entity Representations

Arjit Jain, Sunita Sarawagi, Prithviraj Sen

PDF

1 Repo

TL;DR

This paper introduces DIAL, a scalable active learning method that jointly learns embeddings for entity resolution, improving recall and accuracy while efficiently handling large, heterogeneous datasets.

Contribution

DIAL presents a novel joint embedding learning framework using an Index-By-Committee approach to enhance blocking and matching in active learning for entity resolution.

Findings

01

DIAL achieves higher recall and precision on benchmark datasets.

02

The approach reduces running time compared to existing methods.

03

Effective in multilingual record matching scenarios.

Abstract

Given two large lists of records, the task in entity resolution (ER) is to find the pairs from the Cartesian product of the lists that correspond to the same real world entity. Typically, passive learning methods on such tasks require large amounts of labeled data to yield useful models. Active Learning is a promising approach for ER in low resource settings. However, the search space, to find informative samples for the user to label, grows quadratically for instance-pair tasks making active learning hard to scale. Previous works, in this setting, rely on hand-crafted predicates, pre-trained language model embeddings, or rule learning to prune away unlikely pairs from the Cartesian product. This blocking step can miss out on important regions in the product space leading to low recall. We propose DIAL, a scalable active learning approach that jointly learns embeddings to maximize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ArjitJ/DIAL
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.