Multi-Dimensional Randomized Response

Josep Domingo-Ferrer; Jordi Soria-Comas

arXiv:2010.10881·cs.CR·December 22, 2020

Multi-Dimensional Randomized Response

Josep Domingo-Ferrer, Jordi Soria-Comas

PDF

Open Access

TL;DR

This paper addresses the challenge of applying randomized response to high-dimensional data by proposing clustering and adjustment techniques to improve the accuracy of data distribution estimates while maintaining privacy guarantees.

Contribution

It introduces clustering and adjustment algorithms to mitigate the curse of dimensionality in multi-attribute randomized response methods.

Findings

01

Clustering attributes improves estimation accuracy in high-dimensional RR.

02

Adjustment algorithms help correct biases introduced by independence assumptions.

03

Empirical results demonstrate the effectiveness of the proposed methods.

Abstract

In our data world, a host of not necessarily trusted controllers gather data on individual subjects. To preserve her privacy and, more generally, her informational self-determination, the individual has to be empowered by giving her agency on her own data. Maximum agency is afforded by local anonymization, that allows each individual to anonymize her own data before handing them to the data controller. Randomized response (RR) is a local anonymization approach able to yield multi-dimensional full sets of anonymized microdata that are valid for exploratory analysis and machine learning. This is so because an unbiased estimate of the distribution of the true data of individuals can be obtained from their pooled randomized data. Furthermore, RR offers rigorous privacy guarantees. The main weakness of RR is the curse of dimensionality when applied to several attributes: as the number of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · SARS-CoV-2 detection and testing · Machine Learning and Algorithms