Adaptive Candidate Generation for Scalable Edge-discovery Tasks on Data   Graphs

Mayank Kejriwal

arXiv:1605.00686·cs.AI·July 4, 2017·1 cites

Adaptive Candidate Generation for Scalable Edge-discovery Tasks on Data Graphs

Mayank Kejriwal

PDF

Open Access

TL;DR

This paper introduces a formal framework for learnable DNF blocking schemes on attributed graphs, aiming to reduce quadratic complexity in edge-discovery tasks like entity resolution and link prediction.

Contribution

It develops a graph-theoretic formalism for DNF schemes and explores their learnability within an optimization framework for attributed graphs.

Findings

01

Formalism enables application to heterogeneous attributed graphs

02

Framework demonstrates potential for complexity reduction

03

Empirical case study illustrates practical principles

Abstract

Several `edge-discovery' applications over graph-based data models are known to have worst-case quadratic time complexity in the nodes, even if the discovered edges are sparse. One example is the generic link discovery problem between two graphs, which has invited research interest in several communities. Specific versions of this problem include link prediction in social networks, ontology alignment between metadata-rich RDF data, approximate joins, and entity resolution between instance-rich data. As large datasets continue to proliferate, reducing quadratic complexity to make the task practical is an important research problem. Within the entity resolution community, the problem is commonly referred to as blocking. A particular class of learnable blocking schemes is known as Disjunctive Normal Form (DNF) blocking schemes, and has emerged as state-of-the art for homogeneous (i.e.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Data Quality and Management · Complex Network Analysis Techniques