Learning Expressive Linkage Rules using Genetic Programming
Robert Isele, Christian Bizer

TL;DR
This paper introduces GenLink, a genetic programming algorithm that automatically learns expressive linkage rules for data integration, outperforming previous methods and matching human expert accuracy.
Contribution
The paper presents GenLink, a novel genetic programming approach for automatically learning complex linkage rules from reference data, enhancing data integration accuracy.
Findings
GenLink outperforms previous genetic programming methods.
GenLink achieves accuracy comparable to human-crafted rules.
The algorithm effectively learns discriminative, normalized, and combined linkage rules.
Abstract
A central problem in data integration and data cleansing is to find entities in different data sources that describe the same real-world object. Many existing methods for identifying such entities rely on explicit linkage rules which specify the conditions that entities must fulfill in order to be considered to describe the same real-world object. In this paper, we present the GenLink algorithm for learning expressive linkage rules from a set of existing reference links using genetic programming. The algorithm is capable of generating linkage rules which select discriminative properties for comparison, apply chains of data transformations to normalize property values, choose appropriate distance measures and thresholds and combine the results of multiple comparisons using non-linear aggregation functions. Our experiments show that the GenLink algorithm outperforms the state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Semantic Web and Ontologies · Machine Learning and Data Classification
