Differentially Private n-gram Extraction

Kunho Kim; Sivakanth Gopi; Janardhan Kulkarni; Sergey Yekhanin

arXiv:2108.02831·cs.LG·August 9, 2021

Differentially Private n-gram Extraction

Kunho Kim, Sivakanth Gopi, Janardhan Kulkarni, Sergey Yekhanin

PDF

Open Access 1 Video

TL;DR

This paper introduces a new differentially private algorithm for extracting n-grams from private text data, significantly improving the utility over previous methods by leveraging recent advances in privacy accounting and pruning heuristics.

Contribution

The paper presents a novel differentially private n-gram extraction algorithm that outperforms existing methods through innovative use of DPSU, privacy accounting, and pruning heuristics.

Findings

01

Significant utility improvement over state-of-the-art methods

02

Effective combination of DPSU and pruning heuristics

03

Applicable to NLP and sequence mining tasks

Abstract

We revisit the problem of $n$ -gram extraction in the differential privacy setting. In this problem, given a corpus of private text data, the goal is to release as many $n$ -grams as possible while preserving user level privacy. Extracting $n$ -grams is a fundamental subroutine in many NLP applications such as sentence completion, response generation for emails etc. The problem also arises in other applications such as sequence mining, and is a generalization of recently studied differentially private set union (DPSU). In this paper, we develop a new differentially private algorithm for this problem which, in our experiments, significantly outperforms the state-of-the-art. Our improvements stem from combining recent advances in DPSU, privacy accounting, and new heuristics for pruning in the tree-based approach initiated by Chen et al. (2012).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Differentially Private n-gram Extraction· slideslive

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Cryptography and Data Security

MethodsPruning