Estimating the True Distribution of Data Collected with Randomized Response
Carlos Antonio Pinz\'on, Ehab ElSalamouny, Lucas Massot, Alexis Miller, H\'eber Hwang Arcolezi, Catuscia Palamidessi

TL;DR
This paper derives an exact maximum likelihood estimator for data collected via Randomized Response, improving estimation accuracy and computational efficiency over existing methods like IBU, with practical implications for privacy-preserving data collection.
Contribution
It provides a simple formula for the exact MLE in Randomized Response, bypassing iterative algorithms and enhancing estimation accuracy.
Findings
The exact MLE outperforms IBU in accuracy.
The formula is computationally efficient.
Experimental comparison guides method choice.
Abstract
Randomized Response (RR) is a protocol designed to collect and analyze categorical data with local differential privacy guarantees. It has been used as a building block of mechanisms deployed by Big tech companies to collect app or web users' data. Each user reports an automatic random alteration of their true value to the analytics server, which then estimates the histogram of the true unseen values of all users using a debiasing rule to compensate for the added randomness. A known issue is that the standard debiasing rule can yield a vector with negative values (which can not be interpreted as a histogram), and there is no consensus on the best fix. An elegant but slow solution is the Iterative Bayesian Update algorithm (IBU), which converges to the Maximum Likelihood Estimate (MLE) as the number of iterations goes to infinity. This paper bypasses IBU by providing a simple formula for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSurvey Sampling and Estimation Techniques · Privacy-Preserving Technologies in Data · Mobile Crowdsensing and Crowdsourcing
