N-grams Bayesian Differential Privacy

Osman Ramadan; James Withers; Douglas Orr

arXiv:2101.12736·cs.CR·February 1, 2021·1 cites

N-grams Bayesian Differential Privacy

Osman Ramadan, James Withers, Douglas Orr

PDF

Open Access

TL;DR

This paper introduces a Bayesian differential privacy mechanism for n-gram counts that improves privacy-utility trade-offs in language models by leveraging public data as a prior, outperforming existing methods.

Contribution

It proposes a novel Bayesian approach using public data as a prior to achieve tighter privacy bounds and better utility in n-gram language modeling.

Findings

01

Achieves up to 85% reduction in KL divergence at epsilon=0.1

02

Provides superior privacy protection compared to k-anonymity

03

Offers competitive performance at large vocabularies

Abstract

Differential privacy has gained popularity in machine learning as a strong privacy guarantee, in contrast to privacy mitigation techniques such as k-anonymity. However, applying differential privacy to n-gram counts significantly degrades the utility of derived language models due to their large vocabularies. We propose a differential privacy mechanism that uses public data as a prior in a Bayesian setup to provide tighter bounds on the privacy loss metric epsilon, and thus better privacy-utility trade-offs. It first transforms the counts to log space, approximating the distribution of the public and private data as Gaussian. The posterior distribution is then evaluated and softmax is applied to produce a probability distribution. This technique achieves up to 85% reduction in KL divergence compared to previously known mechanisms at epsilon equals 0.1. We compare our mechanism to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Internet Traffic Analysis and Secure E-voting

MethodsSoftmax