Granularity is crucial when applying differential privacy to text: An investigation for neural machine translation
Doan Nam Long Vu, Timour Igamberdiev, Ivan Habernal

TL;DR
This paper explores how the choice of data granularity, sentence versus document level, impacts the effectiveness and privacy risks of applying differential privacy in neural machine translation systems.
Contribution
It investigates the effects of applying differential privacy at different granularities in NMT, highlighting the importance of selecting the appropriate level for privacy protection.
Findings
Document-level NMT offers better resistance to membership inference attacks.
Applying DP at the sentence level can lead to higher privacy risks.
Granularity choice significantly affects the privacy-utility trade-off in DP-NMT.
Abstract
Applying differential privacy (DP) by means of the DP-SGD algorithm to protect individual data points during training is becoming increasingly popular in NLP. However, the choice of granularity at which DP is applied is often neglected. For example, neural machine translation (NMT) typically operates on the sentence-level granularity. From the perspective of DP, this setup assumes that each sentence belongs to a single person and any two sentences in the training dataset are independent. This assumption is however violated in many real-world NMT datasets, e.g., those including dialogues. For proper application of DP we thus must shift from sentences to entire documents. In this paper, we investigate NMT at both the sentence and document levels, analyzing the privacy/utility trade-off for both scenarios, and evaluating the risks of not using the appropriate privacy granularity in terms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
