Meaningful Data Erasure in the Presence of Dependencies
Vishal Chakraborty, Youri Kaminsky, Sharad Mehrotra, Felix Naumann, Faisal Nawab, Primal Pappachan, Mohammad Sadoghi, Nalini Venkatasubramanian

TL;DR
This paper introduces a formal definition of data erasure that accounts for dependencies, ensuring that inferred data remains bounded, and proposes scalable mechanisms to enforce this in databases.
Contribution
It provides a formal framework for data erasure considering dependencies and develops efficient algorithms to enforce and optimize erasure processes.
Findings
Algorithms are practical and scalable on real datasets
Proposed methods effectively balance cost and throughput
Strategies enable proactive data retention management
Abstract
Data regulations like GDPR require systems to support data erasure but leave the definition of "erasure" open to interpretation. This ambiguity makes compliance challenging, especially in databases where data dependencies can lead to erased data being inferred from remaining data. We formally define a precise notion of data erasure that ensures any inference about deleted data, through dependencies, remains bounded to what could have been inferred before its insertion. We design erasure mechanisms that enforce this guarantee at minimal cost. Additionally, we explore strategies to balance cost and throughput, batch multiple erasures, and proactively compute data retention times when possible. We demonstrate the practicality and scalability of our algorithms using both real and synthetic datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
