Utility and Privacy of Data Sources: Can Shannon Help Conceal and Reveal Information?
Lalitha Sankar, S. Raj Rajagopalan, H. Vincent Poor

TL;DR
This paper proposes a rate distortion theory-based framework to quantify the privacy-utility tradeoff in large data repositories, addressing data leakage concerns and external knowledge effects.
Contribution
It introduces an analytical framework using rate distortion theory to measure privacy and utility, applicable across various data types and privacy methods.
Findings
Develops application-independent privacy and utility metrics
Models external side-information affecting privacy
Analyzes successive disclosures in data sources
Abstract
The problem of private information "leakage" (inadvertently or by malicious design) from the myriad large centralized searchable data repositories drives the need for an analytical framework that quantifies unequivocally how safe private data can be (privacy) while still providing useful benefit (utility) to multiple legitimate information consumers. Rate distortion theory is shown to be a natural choice to develop such a framework which includes the following: modeling of data sources, developing application independent utility and privacy metrics, quantifying utility-privacy tradeoffs irrespective of the type of data sources or the methods of providing privacy, developing a side-information model for dealing with questions of external knowledge, and studying a successive disclosure problem for multiple query data sources.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Internet Traffic Analysis and Secure E-voting
