Repairing Databases over Metric Spaces with Coincidence Constraints
Youri Kaminsky, Benny Kimelfeld, Ester Livshits, Felix Naumann, and, David Wajc

TL;DR
This paper investigates the complexity of repairing inconsistent databases with values in metric spaces, proposing algorithms for special cases and approximation methods for general metrics.
Contribution
It introduces algorithms for optimal repair in tree metrics and approximation for general metrics, along with complexity results for constrained repairs.
Findings
Optimal repair algorithm for tree metrics.
Logarithmic approximation for general metrics.
NP-completeness of constrained repair decision problem.
Abstract
Datasets often contain values that naturally reside in a metric space: numbers, strings, geographical locations, machine-learned embeddings in a Euclidean space, and so on. We study the computational complexity of repairing inconsistent databases that violate integrity constraints, where the database values belong to an underlying metric space. The goal is to update the database values to retain consistency while minimizing the total distance between the original values and the repaired ones. We consider what we refer to as \emph{coincidence constraints}, which include key constraints, inclusion, foreign keys, and generally any restriction on the relationship between the numbers of cells of different labels (attributes) coinciding in a single value, for a fixed attribute set. We begin by showing that the problem is APX-hard for general metric spaces. We then present an algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
