Analyzing Deviations from Monotonic Trends through Database Repair
Shunit Agmon, Jonathan Gal, Amir Gilad, Ester Livshits, Or Mutay, Brit Youngmann, Benny Kimelfeld

TL;DR
This paper introduces Aggregate Order Dependencies (AODs) to quantify and repair deviations from monotonic trends in datasets, proposing algorithms and heuristics that efficiently identify minimal data modifications to restore expected orderings.
Contribution
It extends order dependencies to aggregate functions, formulates the AOD repair problem, and develops algorithms and heuristics with practical efficiency for real-world datasets.
Findings
Algorithms effectively repair datasets to satisfy AODs
Heuristics provide faster approximate solutions
Experimental results show practical efficiency and insight into data violations
Abstract
Datasets often exhibit violations of expected monotonic trends - for example, higher education level correlating with higher average salary, newer homes being more expensive, or diabetes prevalence increasing with age. We address the problem of quantifying how far a dataset deviates from such trends. To this end, we introduce Aggregate Order Dependencies (AODs), an aggregation-centric extension of the previously studied order dependencies. An AOD specifies that the aggregated value of a target attribute (e.g., mean salary) should monotonically increase or decrease with the grouping attribute (e.g., education level). We formulate the AOD repair problem as finding the smallest set of tuples to delete from a table so that the given AOD is satisfied. We analyze the computational complexity of this problem and propose a general algorithmic template for solving it. We instantiate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Mining Algorithms and Applications · Advanced Database Systems and Queries
