Adaptive Candidate Generation for Scalable Edge-discovery Tasks on Data Graphs
Mayank Kejriwal

TL;DR
This paper introduces a formal framework for learnable DNF blocking schemes on attributed graphs, aiming to reduce quadratic complexity in edge-discovery tasks like entity resolution and link prediction.
Contribution
It develops a graph-theoretic formalism for DNF schemes and explores their learnability within an optimization framework for attributed graphs.
Findings
Formalism enables application to heterogeneous attributed graphs
Framework demonstrates potential for complexity reduction
Empirical case study illustrates practical principles
Abstract
Several `edge-discovery' applications over graph-based data models are known to have worst-case quadratic time complexity in the nodes, even if the discovered edges are sparse. One example is the generic link discovery problem between two graphs, which has invited research interest in several communities. Specific versions of this problem include link prediction in social networks, ontology alignment between metadata-rich RDF data, approximate joins, and entity resolution between instance-rich data. As large datasets continue to proliferate, reducing quadratic complexity to make the task practical is an important research problem. Within the entity resolution community, the problem is commonly referred to as blocking. A particular class of learnable blocking schemes is known as Disjunctive Normal Form (DNF) blocking schemes, and has emerged as state-of-the art for homogeneous (i.e.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Data Quality and Management · Complex Network Analysis Techniques
