Differentially Private Learning Needs Better Model Initialization and   Self-Distillation

Ivoline C. Ngong; Joseph P. Near; Niloofar Mireshghallah

arXiv:2410.17566·cs.LG·October 24, 2024

Differentially Private Learning Needs Better Model Initialization and Self-Distillation

Ivoline C. Ngong, Joseph P. Near, Niloofar Mireshghallah

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces DPRefine, a three-phase method that enhances differentially private language model training by improving initialization and output quality, significantly outperforming standard DPSGD in utility and linguistic accuracy.

Contribution

The paper proposes DPRefine, a novel three-phase approach combining data synthesis, DP finetuning, and self-distillation to improve privacy-preserving language model training.

Findings

01

DPRefine outperforms vanilla DPSGD in human evaluations.

02

Reduces linguistic errors in generated text by 84%.

03

Small models like GPT-2 are effective for initialization and distillation.

Abstract

Differentially private SGD (DPSGD) enables privacy-preserving training of language models, but often reduces utility, diversity, and linguistic quality. We introduce DPRefine, a three-phase method that initializes a model using data synthesis from a small pre-trained LM with rigorous filtering, applies DP finetuning on private data, and performs self-distillation to refine outputs. This approach significantly outperforms vanilla DPSGD, with AlpacaEval preferring DPRefine's generations in 78.4% of cases across all datasets. Our analysis reveals that DPRefine reduces linguistic errors in generated text by 84.0%, mitigating grammar and spelling errors, commonly associated with DPSGD. It also reduces inconsistencies of non-private models, such as hallucinated details and misattributed quotes. We find that small models like GPT-2 can be effective for initialization and distillation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uvm-plaid/private_llm_generation
pytorchOfficial

Videos

Differentially Private Learning Needs Better Model Initialization and Self-Distillation· underline

Taxonomy

TopicsReligious Education and Schools

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Dropout · Byte Pair Encoding · Dense Connections · Layer Normalization · Residual Connection · Cosine Annealing · Weight Decay · Linear Warmup With Cosine Annealing