Privacy-Aware Lifelong Learning
Ozan \"Ozdenizci, Elmar Rueckert, Robert Legenstein

TL;DR
This paper introduces PALL, a novel lifelong learning framework that enables models to learn continuously while ensuring privacy by allowing exact unlearning of sensitive data, all within a single neural network.
Contribution
The paper presents a unified approach combining lifelong learning and privacy-aware unlearning using sparse subnetworks and episodic memory, addressing a critical gap in responsible AI.
Findings
PALL achieves scalable lifelong learning across various architectures.
It enables exact unlearning of specific tasks without performance loss.
The method outperforms existing approaches in privacy and continual learning metrics.
Abstract
Lifelong learning algorithms enable models to incrementally acquire new knowledge without forgetting previously learned information. Contrarily, the field of machine unlearning focuses on explicitly forgetting certain previous knowledge from pretrained models when requested, in order to comply with data privacy regulations on the right-to-be-forgotten. Enabling efficient lifelong learning with the capability to selectively unlearn sensitive information from models presents a critical and largely unaddressed challenge with contradicting objectives. We address this problem from the perspective of simultaneously preventing catastrophic forgetting and allowing forward knowledge transfer during task-incremental learning, while ensuring exact task unlearning and minimizing memory requirements, based on a single neural network model to be adapted. Our proposed solution, privacy-aware lifelong…
Peer Reviews
Decision·ICLR 2025 Poster
1. A novel experimental setup is proposed to combine lifelong learning and machine unlearning, addressing key challenges in both domains, which have not been addressed to date. 2. This paper presents privacy-aware lifelong learning (PALL) as a memory-efficient algorithmic solution to this setup, which enables learning without catastrophic forgetting, allows learnable forward knowledge transfer, and ensures exact unlearning guarantees by design. 3. The algorithmic empirically demonstrates the sca
1. This paper presents a new problem setting, but this problem lack significant innovation compared with existing problems. What is the most different points that distinguish this problem from others? It seems the objective of this problem in section 3.2 also holds for existing problem such as domain incremental learning and unlearning. Is there any challenges specific to this problem so that we must formulate it as a new problem? 2. The methods presented for the problem setting proposed in this
- The paper studies an important problem and presents a novel solution. To the best of my knowledge, this work is the first to provide an exact unlearning solution in a lifelong setting of interleaved learned and unlearned tasks. - The paper is for the most part well-written and easy to follow. - The experimental evaluation covers various relevant baselines and ablations. - The proposed method performs well simultaneously in terms of several metrics and desiderata compared to the baselines co
- It seems that the “independent subnetworks without knowledge transfer” ablation actually performs very similarly to the proposed approach (in Table 3), making it harder to motivate the significantly-more-complex variant for the additional 1% accuracy? Especially given that no confidence intervals are reported there, it’s hard to tell if these differences are significant and whether they justify the additional complexity. Are there perhaps other sequences of tasks / datasets where the “knowledg
- The paper is well-written - The proposed method is sound and does well on the experiments. - Modeling data in terms of tasks is a sound approach. - Memory-efficiency is a good plus. Learning a subset of parameters per task is practical and performs well on experiments. - Unlearning is possible and done by memory buffer replay with logits regularization. The performance of unlearning is shown empirically to be exact, which is one of the benefits of PALL. - A better metric is proposed for measur
**Main**: - The algorithm is not necessarily always memory-efficient since the memory complexity still grows linearly in the number of tasks. In other words, the improvement is a constant, albeit a very small one given that it's stored in 1-bit format. However, some models nowadays are trained with less bits (e.g., 8-bits), so storing masks becomes non-trivially expensive when the number of tasks is very large (say, 50). - The tasks are given explicitly. While this might be the case sometimes in
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Privacy-Preserving Technologies in Data · Age of Information Optimization
