ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution
Zeinab Bahmani, Leopoldo Bertossi, Nikolaos Vasiloglou

TL;DR
ERBlox integrates matching dependencies with machine learning and declarative programming to improve entity resolution by supporting blocking, classification, and merging of duplicate records.
Contribution
This work introduces a comprehensive framework combining MDs, ML, and LogiQL for entity resolution, enhancing data cleaning processes.
Findings
Improved accuracy in duplicate detection.
Efficient blocking and merging processes.
Unified declarative approach for ER tasks.
Abstract
Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called "matching dependencies" (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating four components of ER: (a) Building a classifier for duplicate/non-duplicate record pairs built using machine learning (ML) techniques; (b) Use of MDs for supporting the blocking phase of ML; (c) Record merging on the basis of the classifier results; and (d) The use of the declarative language "LogiQL" -an extended form of Datalog supported by the "LogicBlox" platform- for all activities related to data processing, and the specification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
