With Privacy, Size Matters: On the Importance of Dataset Size in Differentially Private Text Rewriting
Stephen Meisenbacher, Florian Matthes

TL;DR
This paper investigates how dataset size impacts the effectiveness of differentially private text rewriting mechanisms in NLP, emphasizing the need for larger datasets for better privacy-utility trade-offs.
Contribution
It introduces the consideration of dataset size in evaluating DP NLP mechanisms and provides large-scale empirical analysis on datasets up to one million texts.
Findings
Larger datasets improve privacy-utility balance in DP text rewriting
Dataset size significantly influences the efficacy of DP NLP methods
Calls for more rigorous evaluation procedures in DP NLP
Abstract
Recent work in Differential Privacy with Natural Language Processing (DP NLP) has proposed numerous promising techniques in the form of text rewriting mechanisms. In the evaluation of these mechanisms, an often-ignored aspect is that of dataset size, or rather, the effect of dataset size on a mechanism's efficacy for utility and privacy preservation. In this work, we are the first to introduce this factor in the evaluation of DP text privatization, where we design utility and privacy tests on large-scale datasets with dynamic split sizes. We run these tests on datasets of varying size with up to one million texts, and we focus on quantifying the effect of increasing dataset size on the privacy-utility trade-off. Our findings reveal that dataset size plays an integral part in evaluating DP text rewriting mechanisms; additionally, these findings call for more rigorous evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Topic Modeling · Mobile Crowdsensing and Crowdsourcing
