DPZero: Private Fine-Tuning of Language Models without Backpropagation
Liang Zhang, Bingcong Li, Kiran Koshy Thekumparampil, Sewoong Oh, Niao, He

TL;DR
DPZero introduces a memory-efficient, privacy-preserving zeroth-order fine-tuning method for large language models, enabling effective private adaptation without backpropagation, suitable for large-scale models and sensitive data.
Contribution
It proposes DPZero, a novel private zeroth-order optimization algorithm with nearly dimension-independent rates for fine-tuning large language models.
Findings
Successfully fine-tuned RoBERTa and OPT privately using DPZero.
Demonstrated significant memory savings during training.
Achieved competitive performance on downstream tasks.
Abstract
The widespread practice of fine-tuning large language models (LLMs) on domain-specific data faces two major challenges in memory and privacy. First, as the size of LLMs continues to grow, the memory demands of gradient-based training methods via backpropagation become prohibitively high. Second, given the tendency of LLMs to memorize training data, it is important to protect potentially sensitive information in the fine-tuning data from being regurgitated. Zeroth-order methods, which rely solely on forward passes, substantially reduce memory consumption during training. However, directly combining them with standard differentially private gradient descent suffers more as model size grows. To bridge this gap, we introduce DPZero, a novel private zeroth-order algorithm with nearly dimension-independent rates. The memory efficiency of DPZero is demonstrated in privately fine-tuning RoBERTa…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques
MethodsAttention Is All You Need · OPT · Linear Layer · Multi-Head Attention · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection · Adam · Attention Dropout · WordPiece
