Differentially Private n-gram Extraction
Kunho Kim, Sivakanth Gopi, Janardhan Kulkarni, Sergey Yekhanin

TL;DR
This paper introduces a new differentially private algorithm for extracting n-grams from private text data, significantly improving the utility over previous methods by leveraging recent advances in privacy accounting and pruning heuristics.
Contribution
The paper presents a novel differentially private n-gram extraction algorithm that outperforms existing methods through innovative use of DPSU, privacy accounting, and pruning heuristics.
Findings
Significant utility improvement over state-of-the-art methods
Effective combination of DPSU and pruning heuristics
Applicable to NLP and sequence mining tasks
Abstract
We revisit the problem of -gram extraction in the differential privacy setting. In this problem, given a corpus of private text data, the goal is to release as many -grams as possible while preserving user level privacy. Extracting -grams is a fundamental subroutine in many NLP applications such as sentence completion, response generation for emails etc. The problem also arises in other applications such as sequence mining, and is a generalization of recently studied differentially private set union (DPSU). In this paper, we develop a new differentially private algorithm for this problem which, in our experiments, significantly outperforms the state-of-the-art. Our improvements stem from combining recent advances in DPSU, privacy accounting, and new heuristics for pruning in the tree-based approach initiated by Chen et al. (2012).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Cryptography and Data Security
MethodsPruning
