ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution
Zeinab Bahmani, Leopoldo Bertossi, Nikolaos Vasiloglou

TL;DR
ERBlox integrates machine learning classifiers, matching dependencies, and LogiQL to improve entity resolution by supporting blocking and merging processes with a declarative approach.
Contribution
This work introduces a novel framework combining ML, MDs, and LogiQL for enhanced entity resolution, demonstrating the benefits of their integration.
Findings
Improved accuracy in entity resolution tasks.
Effective support for blocking and merging phases.
Demonstrated advantages of declarative specifications.
Abstract
Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called matching dependencies (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating three components of ER: (a) Classifiers for duplicate/non-duplicate record pairs built using machine learning (ML) techniques, (b) MDs for supporting both the blocking phase of ML and the merge itself; and (c) The use of the declarative language LogiQL -an extended form of Datalog supported by the LogicBlox platform- for data processing, and the specification and enforcement of MDs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
