On Addressing Efficiency Concerns in Privacy Preserving Data Mining

Shipra Agrawal; Vijay Krishnan; Jayant Haritsa

arXiv:cs/0310038·cs.DB·November 9, 2011·5 cites

On Addressing Efficiency Concerns in Privacy Preserving Data Mining

Shipra Agrawal, Vijay Krishnan, Jayant Haritsa

PDF

Open Access

TL;DR

This paper improves the efficiency of privacy-preserving data mining by optimizing the distortion and reconstruction process, making it feasible to mine distorted data without excessive computational costs.

Contribution

It introduces symbol-specific distortion and optimization techniques to significantly reduce the computational overhead of privacy-preserving data mining.

Findings

01

Runtime efficiency within an order of magnitude of original mining

02

Effective symbol-specific distortion scheme

03

Optimized reconstruction process for faster results

Abstract

Data mining services require accurate input data for their results to be meaningful, but privacy concerns may influence users to provide spurious information. To encourage users to provide correct inputs, we recently proposed a data distortion scheme for association rule mining that simultaneously provides both privacy to the user and accuracy in the mining results. However, mining the distorted database can be orders of magnitude more time-consuming as compared to mining the original database. In this paper, we address this issue and demonstrate that by (a) generalizing the distortion process to perform symbol-specific distortion, (b) appropriately choosing the distortion parameters, and (c) applying a variety of optimizations in the reconstruction process, runtime efficiencies that are well within an order of magnitude of undistorted mining can be achieved.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Data Mining Algorithms and Applications · Privacy-Preserving Technologies in Data