Conformance Constraint Discovery: Measuring Trust in Data-Driven Systems
Anna Fariha, Ashish Tiwari, Arjun Radhakrishna, Sumit Gulwani,, Alexandra Meliou

TL;DR
This paper introduces data invariants, a new data profiling primitive based on arithmetic relationships, to measure trust and detect data drift in data-driven systems, improving reliability and interpretability.
Contribution
It proposes a novel data invariants framework with quantitative violation measures, demonstrating its effectiveness in trusted machine learning and data drift detection.
Findings
Data invariants reliably identify untrustworthy predictions.
They quantify data drift more accurately than existing methods.
Low-variance PCA components serve as effective invariants.
Abstract
The reliability and proper function of data-driven applications hinge on the data's continued conformance to the applications' initial design. When data deviates from this initial profile, system behavior becomes unpredictable. Data profiling techniques such as functional dependencies and denial constraints encode patterns in the data that can be used to detect deviations. But traditional methods typically focus on exact constraints and categorical attributes, and are ill-suited for tasks such as determining whether the prediction of a machine learning system can be trusted or for quantifying data drift. In this paper, we introduce data invariants, a new data-profiling primitive that models arithmetic relationships involving multiple numerical attributes within a (noisy) dataset and which complements the existing data-profiling techniques. We propose a quantitative semantics to measure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Time Series Analysis and Forecasting · Anomaly Detection Techniques and Applications
