Bipartite Graph Matching Algorithms for Clean-Clean Entity Resolution: An Empirical Evaluation
George Papadakis, Vasilis Efthymiou, Emanouil Thanos, Oktie, Hassanzadeh

TL;DR
This paper empirically evaluates eight bipartite graph matching algorithms for clean-clean entity resolution, comparing their accuracy and efficiency across multiple real datasets to guide algorithm selection.
Contribution
It provides a comprehensive empirical comparison of various bipartite graph matching algorithms for clean-clean ER, including some not previously evaluated in this context.
Findings
Certain algorithms outperform others in accuracy depending on data characteristics.
Trade-offs between accuracy and computational efficiency are identified.
Guidelines for selecting the most suitable algorithm based on dataset properties.
Abstract
Entity Resolution (ER) is the task of finding records that refer to the same real-world entities. A common scenario is when entities across two clean sources need to be resolved, which we refer to as Clean-Clean ER. In this paper, we perform an extensive empirical evaluation of 8 bipartite graph matching algorithms that take in as input a bipartite similarity graph and provide as output a set of matched entities. We consider a wide range of matching algorithms, including algorithms that have not previously been applied to ER, or have been evaluated only in other ER settings. We assess the relative performance of the algorithms with respect to accuracy and time efficiency over 10 established, real datasets, from which we extract >700 different similarity graphs. Our results provide insights into the relative performance of these algorithms and guidelines for choosing the best one,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Access Control and Trust · Privacy-Preserving Technologies in Data
