Desensitized RDCA Subspaces for Compressive Privacy in Machine Learning

Artur Filipowicz; Thee Chanyaswad; S. Y. Kung

arXiv:1707.07770·cs.CR·July 26, 2017·1 cites

Desensitized RDCA Subspaces for Compressive Privacy in Machine Learning

Artur Filipowicz, Thee Chanyaswad, S. Y. Kung

PDF

Open Access

TL;DR

This paper proposes a privacy-preserving method using RDCA to desensitize data, effectively reducing privacy risks with minimal impact on utility across multiple datasets.

Contribution

It introduces a novel application of RDCA for data desensitization in machine learning, demonstrating effective privacy protection with low utility loss.

Findings

01

Privacy accuracy drops to near random levels

02

Utility accuracy decreases by around 5-8%

03

Method is effective across multiple datasets

Abstract

The quest for better data analysis and artificial intelligence has lead to more and more data being collected and stored. As a consequence, more data are exposed to malicious entities. This paper examines the problem of privacy in machine learning for classification. We utilize the Ridge Discriminant Component Analysis (RDCA) to desensitize data with respect to a privacy label. Based on five experiments, we show that desensitization by RDCA can effectively protect privacy (i.e. low accuracy on the privacy label) with small loss in utility. On HAR and CMU Faces datasets, the use of desensitized data results in random guess level accuracies for privacy at a cost of 5.14% and 0.04%, on average, drop in the utility accuracies. For Semeion Handwritten Digit dataset, accuracies of the privacy-sensitive digits are almost zero, while the accuracies for the utility-relevant digits drop by 7.53%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Internet Traffic Analysis and Secure E-voting