Anonymizing Unstructured Data
Rajeev Motwani, Shubha U. Nabar

TL;DR
This paper introduces a formal framework for anonymizing set-valued data, such as search logs, by extending k-anonymity, and provides efficient algorithms with practical validation on real datasets.
Contribution
It formalizes k-anonymity for set-valued data and offers approximation algorithms with proven bounds, addressing privacy in unstructured datasets.
Findings
Algorithms achieve O(klogk) and O(1)-approximation guarantees.
Applicable to real-world datasets like AOL query logs.
Demonstrates effectiveness of anonymization methods.
Abstract
In this paper we consider the problem of anonymizing datasets in which each individual is associated with a set of items that constitute private information about the individual. Illustrative datasets include market-basket datasets and search engine query logs. We formalize the notion of k-anonymity for set-valued data as a variant of the k-anonymity model for traditional relational datasets. We define an optimization problem that arises from this definition of anonymity and provide O(klogk) and O(1)-approximation algorithms for the same. We demonstrate applicability of our algorithms to the America Online query log dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Mobile Crowdsensing and Crowdsourcing
