Anonymizing Unstructured Data

Rajeev Motwani; Shubha U. Nabar

arXiv:0810.5582·cs.DB·November 4, 2008·23 cites

Anonymizing Unstructured Data

Rajeev Motwani, Shubha U. Nabar

PDF

Open Access

TL;DR

This paper introduces a formal framework for anonymizing set-valued data, such as search logs, by extending k-anonymity, and provides efficient algorithms with practical validation on real datasets.

Contribution

It formalizes k-anonymity for set-valued data and offers approximation algorithms with proven bounds, addressing privacy in unstructured datasets.

Findings

01

Algorithms achieve O(klogk) and O(1)-approximation guarantees.

02

Applicable to real-world datasets like AOL query logs.

03

Demonstrates effectiveness of anonymization methods.

Abstract

In this paper we consider the problem of anonymizing datasets in which each individual is associated with a set of items that constitute private information about the individual. Illustrative datasets include market-basket datasets and search engine query logs. We formalize the notion of k-anonymity for set-valued data as a variant of the k-anonymity model for traditional relational datasets. We define an optimization problem that arises from this definition of anonymity and provide O(klogk) and O(1)-approximation algorithms for the same. We demonstrate applicability of our algorithms to the America Online query log dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Mobile Crowdsensing and Crowdsourcing