Privacy Adhering Machine Un-learning in NLP

Vinayshekhar Bannihatti Kumar; Rashmi Gangadharaiah; Dan Roth

arXiv:2212.09573·cs.CL·December 20, 2022

Privacy Adhering Machine Un-learning in NLP

Vinayshekhar Bannihatti Kumar, Rashmi Gangadharaiah, Dan Roth

PDF

Open Access

TL;DR

This paper introduces efficient machine unlearning methods for NLP tasks, enabling rapid data removal from models with minimal performance impact, addressing privacy regulations like GDPR and CCPA.

Contribution

It proposes novel, computationally efficient unlearning approaches (SISA-FC and SISA-A) for NLP models that significantly reduce resource usage while maintaining accuracy.

Findings

01

Achieved 90-95% memory reduction

02

Reduced unlearning time by 100x

03

Maintained model performance after unlearning

Abstract

Regulations introduced by General Data Protection Regulation (GDPR) in the EU or California Consumer Privacy Act (CCPA) in the US have included provisions on the \textit{right to be forgotten} that mandates industry applications to remove data related to an individual from their systems. In several real world industry applications that use Machine Learning to build models on user data, such mandates require significant effort both in terms of data cleansing as well as model retraining while ensuring the models do not deteriorate in prediction quality due to removal of data. As a result, continuous removal of data and model retraining steps do not scale if these applications receive such requests at a very high frequency. Recently, a few researchers proposed the idea of \textit{Machine Unlearning} to tackle this challenge. Despite the significant importance of this task, the area of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Machine Learning in Healthcare