PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization
Murat Bilgehan Ertan, Xiaochen Zhu, Phuong Ha Nguyen, Marten van Dijk, Srinivas Devadas

TL;DR
PACZero introduces a novel privacy-preserving fine-tuning method for large language models using sign quantization, achieving high utility at zero mutual information, thus resisting membership inference attacks.
Contribution
It proposes PACZero, a family of PAC-private zeroth-order mechanisms that enable effective fine-tuning of language models with strong privacy guarantees and competitive performance.
Findings
PACZero-ZPL achieves 88.99% accuracy on SST-2 at zero mutual information.
PACZero methods outperform prior approaches in high-privacy regimes.
The approach maintains utility close to non-private baselines at strict privacy levels.
Abstract
We introduce PACZero, a family of PAC-private zeroth-order mechanisms for fine-tuning large language models that delivers usable utility at . This privacy regime bounds the membership-inference attack (MIA) posterior success rate at the prior, an MIA-resistance level the DP framework matches only at and infinite noise. All DP-ZO comparisons below are matched at the MIA posterior level. The key insight is that PAC Privacy charges mutual information only when the release depends on which candidate subset is the secret. Sign-quantizing subset-aggregated zeroth-order gradients creates frequent unanimity, steps at which every candidate subset agrees on the update direction; at these steps the released sign costs zero conditional mutual information. We propose two variants that span the privacy-utility trade-off: PACZero-MI (budgeted MI via exact calibration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
