Federated Learning with a Single Shared Image
Sunny Soni, Aaqib Saeed, Yuki M. Asano

TL;DR
This paper introduces a federated learning method that uses only a single shared image and an adaptive pruning algorithm to improve knowledge transfer, enabling heterogeneous client models without large shared datasets.
Contribution
The paper presents a novel federated learning approach that relies on a single shared image and adaptive cropping, facilitating knowledge distillation with minimal shared data and supporting heterogeneous models.
Findings
Single shared image improves distillation efficiency.
Adaptive cropping selects most informative image parts.
Method supports heterogeneous client architectures.
Abstract
Federated Learning (FL) enables multiple machines to collaboratively train a machine learning model without sharing of private training data. Yet, especially for heterogeneous models, a key bottleneck remains the transfer of knowledge gained from each client model with the server. One popular method, FedDF, uses distillation to tackle this task with the use of a common, shared dataset on which predictions are exchanged. However, in many contexts such a dataset might be difficult to acquire due to privacy and the clients might not allow for storage of a large shared dataset. To this end, in this paper, we introduce a new method that improves this knowledge distillation method to only rely on a single shared image between clients and server. In particular, we propose a novel adaptive dataset pruning algorithm that selects the most informative crops generated from only a single image. With…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
* Using dataset pruning and single data KD is new in federated learning. * Authors show results with different model architectures and domain datasets, which is valuable and interesting. * The evaluations show the practicality of the method for the target datasets.
* Authors should consider more recent baselines for KD-based FL methods. * Could you please elaborate on how your method differs from synthetic data generation (by the server or clients) or dataset distillation in federated learning? * Computation cost, especially for clients, is missing.
The knowledge distillation method under the FL framework is an interesting area for research, as it allows heterogeneous client model architecture to be able to aggregate at the central server.
- The writing in general requires significant improvement, with quite a lot of grammar mistakes and some confusing sentences. - The main method of the paper is not presented well. Normally, in the method section (section 3), the authors should state the problem setups, the objective of the problem with clear definitions, etc. Also, it lacks detailed references; for example, if the patchification techniques are used previously in the KD methods, etc. Again, some notations in the 'entropy selecti
1. It is impressive that only one image is needed to perform KD. 2. The provided dataset pruning strategies are helpful.
1. What would happen if we increase the number of KD images? I would appreciate it if the authors could provide more related ablation results. 2. Comparisons against FedAvg and other federated distillation baselines are missing. 3. Some data-free approaches such as (Zhu et al., 2021b) and “DENSE: Data-Free One-Shot Federated Learning” [NeurIPS 2022] that had completely removed any shared image between server and client, so what would be the unique advantages of using single images in this work?
1. It is interesting to apply knowledge distillation-based aggregation with a single image in federated learning. 2. The authors give a comprehensive discussion of the experiments, and the results seem reasonable and promising.
1. Followed by the first advantage, I think this work is a bit overclaimed. In my opinion, the size of a single shared image should be the same as the training images. However, in this paper, the shared image is of high resolution, which misleads the readers. 2. Followed by the first point, I am not convinced why a high-resolution image is more obtainable than a public dataset that contains the same size as the training data. According to Table 1, the high-resolution image cannot be randomly ge
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Cryptography and Data Security
MethodsDataset Pruning · Knowledge Distillation · Pruning
