Multi-Dimensional Randomized Response
Josep Domingo-Ferrer, Jordi Soria-Comas

TL;DR
This paper addresses the challenge of applying randomized response to high-dimensional data by proposing clustering and adjustment techniques to improve the accuracy of data distribution estimates while maintaining privacy guarantees.
Contribution
It introduces clustering and adjustment algorithms to mitigate the curse of dimensionality in multi-attribute randomized response methods.
Findings
Clustering attributes improves estimation accuracy in high-dimensional RR.
Adjustment algorithms help correct biases introduced by independence assumptions.
Empirical results demonstrate the effectiveness of the proposed methods.
Abstract
In our data world, a host of not necessarily trusted controllers gather data on individual subjects. To preserve her privacy and, more generally, her informational self-determination, the individual has to be empowered by giving her agency on her own data. Maximum agency is afforded by local anonymization, that allows each individual to anonymize her own data before handing them to the data controller. Randomized response (RR) is a local anonymization approach able to yield multi-dimensional full sets of anonymized microdata that are valid for exploratory analysis and machine learning. This is so because an unbiased estimate of the distribution of the true data of individuals can be obtained from their pooled randomized data. Furthermore, RR offers rigorous privacy guarantees. The main weakness of RR is the curse of dimensionality when applied to several attributes: as the number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · SARS-CoV-2 detection and testing · Machine Learning and Algorithms
