Differentially Private Language Generation and Identification in the Limit
Anay Mehrotra, Grigoris Velegkas, Xifan Yu, Felix Zhou

TL;DR
This paper explores the impact of differential privacy on language generation and identification in the limit, revealing that privacy can be achieved without qualitative loss in some cases but introduces fundamental barriers in others.
Contribution
It introduces differentially private algorithms for language generation in the limit and characterizes the limitations and possibilities of private language identification under various conditions.
Findings
Privacy allows for limit language generation from countable collections without qualitative cost.
Finite collections require more samples for private generation compared to non-private.
Privacy creates fundamental barriers for certain language identification problems, especially in adversarial settings.
Abstract
We initiate the study of language generation in the limit, a model recently introduced by Kleinberg and Mullainathan [KM24], under the constraint of differential privacy. We consider the continual release model, where a generator must eventually output a stream of valid strings while protecting the privacy of the entire input sequence. Our first main result is that for countable collections of languages, privacy comes at no qualitative cost: we provide an -differentially-private algorithm that generates in the limit from any countable collection. This stands in contrast to many learning settings where privacy renders learnability impossible. However, privacy does impose a quantitative cost: there are finite collections of size for which uniform private generation requires samples, whereas just one sample suffices non-privately. We then turn to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
