Machine Unlearning for Document Classification

Lei Kang; Mohamed Ali Souibgui; Fei Yang; Lluis Gomez; Ernest Valveny,; Dimosthenis Karatzas

arXiv:2404.19031·cs.CV·May 1, 2024

Machine Unlearning for Document Classification

Lei Kang, Mohamed Ali Souibgui, Fei Yang, Lluis Gomez, Ernest Valveny,, Dimosthenis Karatzas

PDF

Open Access 1 Repo

TL;DR

This paper introduces the first study of machine unlearning in document classification, enabling models to forget specific data to enhance privacy while maintaining performance.

Contribution

It pioneers applying machine unlearning techniques to document classification, addressing privacy concerns with a realistic server-side scenario.

Findings

01

First investigation into machine unlearning for document classification

02

Proposes a practical scenario with limited data access for efficient forgetting

03

Code is publicly available for reproducibility

Abstract

Document understanding models have recently demonstrated remarkable performance by leveraging extensive collections of user documents. However, since documents often contain large amounts of personal data, their usage can pose a threat to user privacy and weaken the bonds of trust between humans and AI services. In response to these concerns, legislation advocating ``the right to be forgotten" has recently been proposed, allowing users to request the removal of private information from computer systems and neural network models. A novel approach, known as machine unlearning, has emerged to make AI models forget about a particular class of data. In our research, we explore machine unlearning for document classification problems, representing, to the best of our knowledge, the first investigation into this area. Specifically, we consider a realistic scenario where a remote server houses a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leitro/machineunlearning-docclassification
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics, Computing, and Information Processing