Privacy Regularization: Joint Privacy-Utility Optimization in Language   Models

Fatemehsadat Mireshghallah; Huseyin A. Inan; Marcello Hasegawa; Victor; R\"uhle; Taylor Berg-Kirkpatrick; Robert Sim

arXiv:2103.07567·cs.LG·April 19, 2021·5 cites

Privacy Regularization: Joint Privacy-Utility Optimization in Language Models

Fatemehsadat Mireshghallah, Huseyin A. Inan, Marcello Hasegawa, Victor, R\"uhle, Taylor Berg-Kirkpatrick, Robert Sim

PDF

Open Access

TL;DR

This paper introduces two novel privacy-preserving regularization techniques for training language models that improve the utility-privacy trade-off, ensure fairness across subgroups, and offer faster training compared to differential privacy methods.

Contribution

The paper proposes two new regularization methods for language models that jointly optimize privacy and utility, outperforming differential privacy in utility, training speed, and fairness.

Findings

01

Regularizers achieve better utility-privacy balance than DP.

02

Methods enable faster training and compatibility with existing optimization.

03

Ensure fair treatment of under-represented subgroups.

Abstract

Neural language models are known to have a high capacity for memorization of training samples. This may have serious privacy implications when training models on user content such as email correspondence. Differential privacy (DP), a popular choice to train models with privacy guarantees, comes with significant costs in terms of utility degradation and disparate impact on subgroups of users. In this work, we introduce two privacy-preserving regularization methods for training language models that enable joint optimization of utility and privacy through (1) the use of a discriminator and (2) the inclusion of a triplet-loss term. We compare our methods with DP through extensive evaluation. We show the advantages of our regularizers with favorable utility-privacy trade-off, faster training with the ability to tap into existing optimization approaches, and ensuring uniform treatment of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Mobile Crowdsensing and Crowdsourcing · Artificial Intelligence in Healthcare and Education