Differentially Private Histograms in the Shuffle Model from Fake Users

Albert Cheu; Maxim Zhilyaev

arXiv:2104.02739·cs.CR·August 9, 2021

Differentially Private Histograms in the Shuffle Model from Fake Users

Albert Cheu, Maxim Zhilyaev

PDF

Open Access

TL;DR

This paper introduces a new differentially private histogram protocol in the shuffle model that achieves constant message complexity per user with minimal error, using fake users and simple randomization, supported by theoretical and experimental validation.

Contribution

It presents a novel protocol with constant message complexity in the shuffle model for differential privacy, reducing resource use while maintaining accuracy.

Findings

01

Message complexity is constant (two messages per user) with many users.

02

The protocol achieves small error through simple randomization and fake users.

03

Corrupt users have limited impact on the privacy guarantees.

Abstract

There has been much recent work in the shuffle model of differential privacy, particularly for approximate $d$ -bin histograms. While these protocols achieve low error, the number of messages sent by each user -- the message complexity -- has so far scaled with $d$ or the privacy parameters. The message complexity is an informative predictor of a shuffle protocol's resource consumption. We present a protocol whose message complexity is two when there are sufficiently many users. The protocol essentially pairs each row in the dataset with a fake row and performs a simple randomization on all rows. We show that the error introduced by the protocol is small, using rigorous analysis as well as experiments on real-world data. We also prove that corrupt users have a relatively low impact on our protocol's estimates.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting · Privacy, Security, and Data Protection