AlignDP: Hybrid Differential Privacy with Rarity-Aware Protection for LLMs
Madhava Gaikwad

TL;DR
AlignDP introduces a hybrid differential privacy approach for large language models, protecting rare and frequent data fields differently to prevent knowledge leakage while maintaining utility.
Contribution
The paper proposes a novel two-tier privacy mechanism combining PAC indistinguishability and RAPPOR for enhanced LLM data protection.
Findings
Rare categories remain hidden under privacy guarantees.
Frequent categories are accurately recovered with small error.
Theoretical bounds and utility trade-offs are established.
Abstract
Large language models are exposed to risks of extraction, distillation, and unauthorized fine-tuning. Existing defenses use watermarking or monitoring, but these act after leakage. We design AlignDP, a hybrid privacy lock that blocks knowledge transfer at the data interface. The key idea is to separate rare and non-rare fields. Rare fields are shielded by PAC indistinguishability, giving effective zero-epsilon local DP. Non-rare fields are privatized with RAPPOR, giving unbiased frequency estimates under local DP. A global aggregator enforces composition and budget. This two-tier design hides rare events and adds controlled noise to frequent events. We prove limits of PAC extension to global aggregation, give bounds for RAPPOR estimates, and analyze utility trade-off. A toy simulation confirms feasibility: rare categories remain hidden, frequent categories are recovered with small error.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Cryptography and Data Security
