DPZero: Private Fine-Tuning of Language Models without Backpropagation

Liang Zhang; Bingcong Li; Kiran Koshy Thekumparampil; Sewoong Oh; Niao; He

arXiv:2310.09639·cs.LG·June 7, 2024·1 cites

DPZero: Private Fine-Tuning of Language Models without Backpropagation

Liang Zhang, Bingcong Li, Kiran Koshy Thekumparampil, Sewoong Oh, Niao, He

PDF

Open Access 1 Repo

TL;DR

DPZero introduces a memory-efficient, privacy-preserving zeroth-order fine-tuning method for large language models, enabling effective private adaptation without backpropagation, suitable for large-scale models and sensitive data.

Contribution

It proposes DPZero, a novel private zeroth-order optimization algorithm with nearly dimension-independent rates for fine-tuning large language models.

Findings

01

Successfully fine-tuned RoBERTa and OPT privately using DPZero.

02

Demonstrated significant memory savings during training.

03

Achieved competitive performance on downstream tasks.

Abstract

The widespread practice of fine-tuning large language models (LLMs) on domain-specific data faces two major challenges in memory and privacy. First, as the size of LLMs continues to grow, the memory demands of gradient-based training methods via backpropagation become prohibitively high. Second, given the tendency of LLMs to memorize training data, it is important to protect potentially sensitive information in the fine-tuning data from being regurgitated. Zeroth-order methods, which rely solely on forward passes, substantially reduce memory consumption during training. However, directly combining them with standard differentially private gradient descent suffers more as model size grows. To bridge this gap, we introduce DPZero, a novel private zeroth-order algorithm with nearly dimension-independent rates. The memory efficiency of DPZero is demonstrated in privately fine-tuning RoBERTa…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liang137/dpzero
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques

MethodsAttention Is All You Need · OPT · Linear Layer · Multi-Head Attention · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection · Adam · Attention Dropout · WordPiece