Flashback: Understanding and Mitigating Forgetting in Federated Learning
Mohammed Aljahdali, Ahmed M. Abdelmoniem, Marco Canini, Samuel, Horv\'ath

TL;DR
This paper investigates forgetting in federated learning caused by data heterogeneity, introduces a new metric for measuring forgetting, and proposes Flashback, a dynamic distillation algorithm that improves convergence and knowledge retention.
Contribution
The paper presents a novel metric for granular measurement of forgetting and introduces Flashback, an FL algorithm with dynamic distillation to mitigate forgetting and enhance learning efficiency.
Findings
Flashback outperforms existing methods across benchmarks.
It reduces the number of rounds to reach target accuracy.
It effectively mitigates knowledge loss in heterogeneous data settings.
Abstract
In Federated Learning (FL), forgetting, or the loss of knowledge across rounds, hampers algorithm convergence, particularly in the presence of severe data heterogeneity among clients. This study explores the nuances of this issue, emphasizing the critical role of forgetting in FL's inefficient learning within heterogeneous data contexts. Knowledge loss occurs in both client-local updates and server-side aggregation steps; addressing one without the other fails to mitigate forgetting. We introduce a metric to measure forgetting granularly, ensuring distinct recognition amid new knowledge acquisition. Leveraging these insights, we propose Flashback, an FL algorithm with a dynamic distillation approach that is used to regularize the local models, and effectively aggregate their knowledge. Across different benchmarks, Flashback outperforms other methods, mitigates forgetting, and achieves…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1. The NonIID problem in FL is important and using distillation to solve this issue achieves promising results. 2. The writing is easy to follow.
The major concern is that the experiments are too limited: 1. All baselines are too old. The newest baseline FedReg which is proposed in 2022. In fact, many recent baselines for soving the NonIID problem are proposed in 2023 and 2024. 2. The evaluations should be conducted on different NonIID settings. This paper only adopt a fixed NonIID setting across all datasets, which is not sufficient to demonstrate the effectiveness of the proposed method. 3. Larger models such as ResNet18 can be include
This paper deals with data heterogeneity in federated learning from the perspective of forgetting which is very novel and gives a comprehensive framework from observation, derivation, solution design and experimental proof. It is well written and easy to understand.
1 The researched work related to data heterogeneity in federated learning is insufficient, especially for personalized federated learning and clustered federated learning. 2 The elaboration of the concept of forgetting is not detailed enough, lacking comparisons between different categories of continual learning (class-CL, task-CL, domain-CL), and lacks a detailed elaboration of forgetting mechanisms (e.g., weight drift, activation drift, inter-task confusion, and task-recency bias). 3 The for
1. The authors provide a detailed analysis and illustration of the key issues - where forgetting occurs in non.iid FL, which is an important problem in FL. 2. The proposed Flashback method is simple and easy to understand, and it outperforms a variety of FL baselines 3. This paper introduces fine-grained evaluation metrics for forgetting in FL.
1. This paper proposes a fine-grained metric for assessing forgetting, but this does not seem to be reflected in Flashback. The authors should emphasize the connection between this metric and Flashback. 2. I have concerns about the use of the public dataset on the server. If the distribution of that dataset is similar to the distribution of data on each client, does this mean that there is already a leakage problem in that environment? Also, I'm curious what happens if a different dataset is us
1. The NonIID problem in FL is important and using distillation to solve this issue achieves promising results. 2. The writing is easy to follow.
1. The novelty is limited. In fact, there have been massive methods using knowledge distillation to solve the NonIID problem, e.g., [1,2]. The experiments should also include them for the comprehensiveness of comparison. [1] DaFKD: Domain-Aware Federated Knowledge Distillation. CVPR 2023. [2] Data-free knowledge distillation for heterogeneous federated learning. ICML 2021. 2. It is better to include a figure to illustrate the method framework for ease of understanding. 3. It would be better
(a) The method includes comparisons with a variety of baselines that incorporate regularization and distillation techniques. (b) The paper is well-written, with clear presentation and structure that is easy to follow.
(a) The investigation of forgetting is less systematic than claimed. While the paper frames forgetting as a key factor in FL underperformance, it lacks detailed analysis and comparisons with baselines from continual learning, where many regularization- [1] based methods effectively mitigate forgetting. Exploring whether these methods can similarly address forgetting in FL would be valuable. (b) Fairness in performance comparison on public datasets is an issue, as the proposed method uses a port
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
