Adaptively Private Next-Token Prediction of Large Language Models
James Flemings, Meisam Razaviyayn, and Murali Annavaram

TL;DR
This paper introduces AdaPMixED, an adaptive private decoding framework for large language models that reduces privacy loss significantly while maintaining utility, addressing scalability issues of previous DP methods.
Contribution
The paper proposes AdaPMixED, a novel adaptive private inference method for LLMs that improves privacy-utility trade-offs over existing fixed privacy level approaches.
Findings
Reduces privacy loss by 16x compared to prior methods
Maintains strong utility with 100K predictions
Achieves a privacy loss of 5.25 in practical scenarios
Abstract
As Large Language Models (LLMs) proliferate, developing privacy safeguards for these models is crucial. One popular safeguard involves training LLMs in a differentially private manner. However, such solutions are shown to be computationally expensive and detrimental to the utility of these models. Since LLMs are deployed on the cloud and thus only accessible via an API, a Machine Learning as a Service (MLaaS) provider can protect its downstream data by privatizing the predictions during the decoding process. However, the practicality of such solutions still largely lags behind DP training methods. One recent promising approach, Private Mixing of Ensemble Distributions (PMixED), avoids additive noise by sampling from the output distributions of private LLMs mixed with the output distribution of a public model. Yet, PMixED must satisfy a fixed privacy level for a given number of queries,…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
Reasonably Novel idea, that reintroduces work from PATE which was largely overlooked. Well structured and the problem is well introduced. See Questions.
I think that the general framing of the paper is good, but more work can be done to clarify the specific contributions and algorithms of the paper. For example, I was a bit confused by what the noisy screening method concretely is. Why is it not sparse vector, or confident-GNmax for PATE? Perhaps I missed something but it feels the algorithm could be more clearly specified. See Questions.
- The paper is well-written and the evaluation includes comparing AdaPMixED with other DP methods across different datasets. The results demonstrated large privacy gains while maintaining utility, especially in high-query situations. - The proposed method is clearly explained and simple to implement, making the previous approach PMixED more practical.
- The novelty is somewhat limited where it seems like only the noise screen filtering mechanism is original. The data dependent privacy analysis is an application of an existing approach to PMixED. Though the authors emphasized the importance of noise screen filtering, from the results in Table 2, majority of the privacy loss gain comes from data dependent privacy analysis. - The model used in experiments (GPT2) seems a bit outdated, it is unclear how the privacy and utility trade-off really loo
The author introduced several important advantages of AdaPMixED in the field of privacy protection LLM. First, AdaPMixED is highly scalable and capable of handling up to 100,000 queries with little impact on privacy and utility, which is a significant improvement over both PMixED and traditional DP methods. This makes it ideal for practical applications where MLaaS is widely used. Furthermore, this method significantly reduces privacy loss by dynamically adjusting privacy parameters based on rea
In my opinion, a key weakness of the AdaPMixED framework is its handling of situations where there are large divergences between private and public model outputs. In this case, the system defaults to using the output of the public model to reduce privacy risks. However, this situation illustrates that the output of private models is very important and should not be ignored. For example, the private model may provide personalized output that the public model alone cannot provide. Furthermore, th
Main strengths: - The privacy-utility improvements of AdaPMixED can be particularly significant, because they show that next-token prediction can be practical, compared to well-studied alternatives like DP-SGD. I was personally not aware of the PMixED line of work, and I used to consider that next-token was a dead-end in terms of DP, since privacy loss accumulates with the number of queries. It turns out that mixing public predictions with private predictions, along with a careful privacy analys
Data-dependent loss - The data-dependent loss is a major flaw of this paper in my opinion, and the reason for my overall "reject" score. I am willing to change my score if I can be convinced that the current results are fair (maybe I misunderstood something), or with new results that incorporate a sanitized version of the data-dependent loss. - My problem with data-dependent loss is that it can leak privacy, since it depends on the data (more specifically, it depends on the whole database passed
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
Methodstravel james
