Machine Unlearning for Document Classification
Lei Kang, Mohamed Ali Souibgui, Fei Yang, Lluis Gomez, Ernest Valveny,, Dimosthenis Karatzas

TL;DR
This paper introduces the first study of machine unlearning in document classification, enabling models to forget specific data to enhance privacy while maintaining performance.
Contribution
It pioneers applying machine unlearning techniques to document classification, addressing privacy concerns with a realistic server-side scenario.
Findings
First investigation into machine unlearning for document classification
Proposes a practical scenario with limited data access for efficient forgetting
Code is publicly available for reproducibility
Abstract
Document understanding models have recently demonstrated remarkable performance by leveraging extensive collections of user documents. However, since documents often contain large amounts of personal data, their usage can pose a threat to user privacy and weaken the bonds of trust between humans and AI services. In response to these concerns, legislation advocating ``the right to be forgotten" has recently been proposed, allowing users to request the removal of private information from computer systems and neural network models. A novel approach, known as machine unlearning, has emerged to make AI models forget about a particular class of data. In our research, we explore machine unlearning for document classification problems, representing, to the best of our knowledge, the first investigation into this area. Specifically, we consider a realistic scenario where a remote server houses a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing
