Enforcing Relational Matching Dependencies with Datalog for Entity Resolution
Zeinab Bahmani, Leopoldo Bertossi

TL;DR
This paper introduces relational matching dependencies (MDs) for entity resolution, extending existing MDs to better capture application semantics, and shows how to rewrite these into stratified Datalog programs for efficient cleaning.
Contribution
It extends MDs to relational MDs that incorporate more application semantics and provides a method to rewrite them into stratified Datalog programs.
Findings
Relational MDs better capture application-specific semantics.
Certain classes of relational MDs can be rewritten into stratified Datalog.
The approach enables efficient entity resolution with a single clean instance.
Abstract
Entity resolution (ER) is about identifying and merging records in a database that represent the same real-world entity. Matching dependencies (MDs) have been introduced and investigated as declarative rules that specify ER policies. An ER process induced by MDs over a dirty instance leads to multiple clean instances, in general. General "answer sets programs" have been proposed to specify the MD-based cleaning task and its results. In this work, we extend MDs to "relational MDs", which capture more application semantics, and identify classes of relational MDs for which the general ASP can be automatically rewritten into a stratified Datalog program, with the single clean instance as its standard model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Access Control and Trust · Topic Modeling
