Federated Learning for Large Models in Medical Imaging: A Comprehensive Review
Mengyu Sun
Ziyuan Yang
Yongqiang Huang
Hui Yu
Yingyu Chen
Shuren Qi
Andrew Beng Jin Teoh
\IEEEmembershipSenior Member, IEEE
Yi Zhang
\IEEEmembershipSenior Member, IEEE
This work did not involve human subjects or animals in its research.Corresponding author: Yi ZhangM. Sun, Z. Yang, Y. Huang, and Y. Zhang are with the School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China (e-mail: [email protected], [email protected], [email protected], and [email protected]).H. Yu is with Sichuan Institute of Computer Sciences, Chengdu 610042, China (e-mail: [email protected]).Y. Chen is with the College of Computer Science, Sichuan University, Chengdu 610065, China (e-mail: [email protected]).S. Qi is with the Department of Mathematics, The Chinese University of Hong Kong, Hong Kong, China (e-mail: [email protected]).A. B. J. Teoh is with the School of Electrical and Electronic Engineering, College of Engineering, Yonsei University, Seoul, Republic of Korea (e-mail: [email protected]).
Abstract
Artificial intelligence (AI) has demonstrated considerable potential in the realm of medical imaging. However, the development of high-performance AI models typically necessitates training on large-scale, centralized datasets. This approach is confronted with significant challenges due to strict patient privacy regulations and legal restrictions on data sharing and utilization. These limitations hinder the development of large-scale models in medical domains and impede continuous updates and training with new data. Federated Learning (FL), a privacy-preserving distributed training framework, offers a new solution by enabling collaborative model development across fragmented medical datasets. In this survey, we review FL’s contributions at two stages of the full-stack medical analysis pipeline. First, in upstream tasks such as CT or MRI reconstruction, FL enables joint training of robust reconstruction networks on diverse, multi-institutional datasets, alleviating data scarcity while preserving confidentiality. Second, in downstream clinical tasks like tumor diagnosis and segmentation, FL supports continuous model updating by allowing local fine-tuning on new data without centralizing sensitive images. We comprehensively analyze FL implementations across the medical imaging pipeline, from physics-informed reconstruction networks to diagnostic AI systems, highlighting innovations that improve communication efficiency, align heterogeneous data, and ensure secure parameter aggregation. Meanwhile, this paper provides an outlook on future research directions, aiming to serve as a valuable reference for the field’s development.
{IEEEkeywords}
Federated Learning, Medical Imaging, Medical Image Analysis, Large Models
1 Introduction
In recent years, the rapid advancement of artificial intelligence (AI) has demonstrated immense potential across diverse domains [1, 2, 3]. Fig. 1 illustrates the evolution of medical image analysis and reconstruction, highlighting key advancements in the field. In medical imaging, AI technologies are increasingly transforming modern methods for medical image analysis and processing [4]. However, most existing AI approaches remain data-driven and require large-scale, high-quality, well-annotated datasets for training [5, 6]. Medical image annotation requires manual labeling by domain experts, such as radiologists and pathologists—a process that is both cost-prohibitive and time-intensive [7].
Unlike medical images, natural images can be labeled by non-experts: crowd workers can handle routine tasks such as object detection, classification, and segmentation. This ease of annotation enables the rapid construction of large-scale datasets via crowdsourcing platforms. In contrast, medical images require well-trained radiologists or clinicians, which inherently limits the speed and scale of labeling. Further complicating the issue, medical images contain sensitive patient information governed by privacy laws, storage limitations, and institutional data governance policies [8]. Consequently, the very factors that have fueled the success of large foundation models in general-purpose computer vision, abundant data, and open sharing, are largely absent in the medical domain. Thus, addressing data scarcity while protecting patient privacy remains a critical bottleneck and a pressing avenue for innovation.
Collaborative machine learning across multiple data owners, with a focus on preserving data privacy, has garnered substantial attention from both academia and industry. To enable privacy-preserving machine learning, McMahan [9] propose Federated Learning (FL), a distributed learning framework known as FedAvg. Due to its inherent privacy-preserving properties, FL has been widely adopted in various scenarios [10]. In FL, clients independently train local models using their own data and upload model parameters or gradients to a central server. The server aggregates these updates to refine a global model, which is then redistributed to the clients for subsequent training rounds. During the entire training process, clients’ data remains local, with only model parameters or gradient updates transmitted to the central server. This mechanism mitigates data leakage risks and strengthens privacy protection capabilities [11].
In smart healthcare systems, workflows typically include both upstream medical image reconstruction and downstream image analysis tasks [12]. However, the nature and impact of data heterogeneity vary significantly between these two categories.
Medical image reconstruction primarily focuses on restoring high-quality images from low-quality or incomplete imaging data [13]. For instance, due to the potential harm of X-ray radiation, clinical protocols often mandate reduced radiation doses during medical examinations. However, this reduction inevitably leads to degraded image quality.
In low-dose (LD) computed tomography (CT), different healthcare institutions may employ various scanner models or LD protocols (e.g., scanning angles, X-ray photon intensities), resulting in distinct noise distribution patterns [14]. This inconsistency in data distribution hinders the generalizability of conventional DL models across clinical settings, consequently affecting reconstruction stability and accuracy.
In contrast, magnetic resonance imaging (MRI) often adopts accelerated acquisition protocols to shorten scanning time and improve patient comfort. However, variations in MRI scanner hardware configurations (e.g., magnetic field strengths, signal acquisition protocols) and institution-specific reconstruction algorithms collectively contribute to MRI data heterogeneity [15].
Data heterogeneity in medical image analysis primarily stems from three fundamental sources: (1) demographic distribution discrepancies across hospital populations [16], (2) variation in histopathological data processing protocols [17], and (3) imbalanced disease prevalence ratios [18]. Specifically, individual factors such as age, gender, and ethnicity contribute to anatomical variability and diverse lesion characteristics in medical images. Meanwhile, differences in histopathological preparation, including staining protocols and digital scanning devices, further shift data distributions. In addition, disease prevalence varies significantly across institutions: specialized hospitals focus on particular disease groups, while general hospitals serve more diverse populations, with disease severity also varying among medical centers.
Consequently, addressing data heterogeneity [19] in both imaging and analytical tasks has emerged as a critical research frontier in medical image analysis. The main challenge lies in developing methods that simultaneously mitigate model drift caused by divergent optimization trajectories in FL processes and enhance the generalization of AI models across institutions. To address these challenges, this review provides a technical analysis of state-of-the-art solutions tailored to imaging-oriented and analysis-driven FL frameworks.
In recent years, several comprehensive surveys on FL in medical imaging have been published.
For example, Guan et al. [20] provide a comprehensive survey of FL methods in medical image analysis, which categorizes approaches into client-side, server-side, and communication techniques. Hernandez-Cruz et al. [21] similarly review FL research in medical imaging, highlighting applications (e.g., cardiology, dermatology, oncology) and recurring challenges, such as non-IID data distributions and privacy preservation.
Silva et al. [22] offer a systematic survey on medical imaging modalities (MRI, CT, X‑ray, histology), which discusses the applications, contributions, limitations, and challenges of FL in these domains.
Wang et al. [23] investigate FL specifically for rare disease detection and summarize existing AI techniques and available datasets for this niche application
In a related area, Shi et al. [24] explore the trustworthiness aspects of foundation models in medical image analysis, a topic that complements FL surveys by focusing on privacy, robustness, and fairness in large pretrained models. Finally, Raza et al. [25] conduct a PRISMA-based meta-survey of FL in radiomics, which aggregates trends in tumor detection, organ segmentation, and disease classification tasks.
While these reviews provide valuable insights, their scope often fails to cover the entire imaging pipeline. Existing surveys typically examine image reconstruction, segmentation, and diagnosis as separate topics, rather than treating them as interconnected stages in a federated workflow. Furthermore, the integration of emerging large medical foundation models and advanced data compression techniques has received limited systematic attention in the context of FL across the entire imaging chain. This paper aims to bridge these gaps. We focus on integrating FL throughout the end-to-end medical imaging pipeline, starting from physics-driven image reconstruction to downstream analysis tasks. Additionally, we explore opportunities to incorporate large-scale vision models and efficient data compression techniques into FL frameworks tailored for this comprehensive workflow. A comparison with previous surveys is summarized in Table 1, highlighting the expansive scope of this review.
The remainder of this paper is organized as follows: 2 introduces FL workflows and outlines the associated challenges; Chapter 3 reviews existing FL research in medical image reconstruction; Chapter 4 analyzes FL applications in medical image analysis; Chapter 5 elucidates persistent technical bottlenecks and clinical implementation challenges, and proposes future research directions; finally, we summarize the key findings and contributions.
2 Preliminaries on FL
In medical imaging applications, FL offers a privacy-preserving paradigm that allows decentralized healthcare institutions to jointly train a robust model without sharing patient data. The overall workflow can be organized into four key stages as follows.
(1) Initialization
First, the central server defines the FL task and its requirements, then identifies and invites the participating clients. After establishing collaboration agreements, the server initializes the global model and distributes the initial parameters to all clients, ensuring synchronized initialization of local models throughout the federated network.
(2) Local Training
Upon receiving the model from the server, either during initialization or at the commencement of each communication round, clients leverage local computational resources to train the model on their private datasets, aiming to minimize the loss function of the local model. While some FL approaches may involve multiple local models with distinct loss functions that regularize the divergence between local and global models to mitigate deviation and preserve consistency, this discussion focuses specifically on scenarios that employ a single model architecture. For the sake of analytical uniformity, the loss function considered here includes only task-specific objectives and excludes any additional regularization terms.
The local model update is performed as follows:
[TABLE]
where L denotes the task-specific loss function (e.g., cross-entropy loss for classification tasks), θ denotes the model parameters, and θk represents the parameters of the k-th client’s local model.
(3) Model Aggregation
After receiving local model parameters from multiple clients, the central server aggregates them to generate an updated global model. In the FedAvg algorithm, the global model is updated by performing weighted averaging of parameters uploaded by clients [26]. Specifically, FedAvg calculates weighting coefficients based on the data volume of each client, ensuring that clients with larger datasets contribute more proportionally to the global model during aggregation. This mechanism aligns the aggregation process with the relative significance of each client’s data distribution to the global model’s optimization trajectory.
The following formula represents this process:
[TABLE]
where θtk denotes the model parameters of the k-th client in the t-th round, K represents the total number of clients, ∣Dk∣ and ∣D∣ indicate the sample size of the k-th client and the total dataset respectively, and θt+1 represents the global model of the (t+1)-th round.
(4) Model Update
After aggregating the parameters, the central server updates the global model and distributes the updated parameters to all clients. Upon receiving the updated global model, clients update their local models and initiate a new round of local training. This iterative process repeats the aggregation-training cycle until the training phase is complete.
The overall FL framework for medical imaging and image analysis is illustrated in Fig. 2.
Not all FL algorithms strictly follow this standard workflow. Depending on the application context and underlying theoretical assumptions, some methods selectively refine specific components of the pipeline. For instance, knowledge distillation (KD) [27] can replace gradient exchange in some cases to leverage heterogeneous client data for training more robust and generalizable models [28]. Despite these variations, all these approaches share a common goal: enabling collaborative learning across distributed data while preserving data privacy.
However, as noted earlier, real-world healthcare settings inherently involve variations in geographic location, population demographics, and clinical protocols across institutions, leading to locally collected data with non-identical distributions [29]. As a result, medical data from different organizations often exhibit varying degrees of feature and label shifts. Addressing the challenges posed by non-Independent and Identically Distributed (non-IID) data is crucial for the effective deployment of FL in smart healthcare systems.
For example, FedProx [30] is considered an enhanced version of FedAvg, which introduces a proximal regularizer alongside the task-specific loss during optimization. This regularizer constrains the discrepancy between local and global models to prevent model drift. Similarly, Li et al. [31]propose the MOdel-cONtrastive (MOON) method, which minimizes the distance between feature representations of local and global models to mitigate model divergence. The key difference lies in FedProx’s parameter-level constraints versus MOON’s feature-level constraints. Additionally, some approaches mitigate data heterogeneity through optimized optimizer designs [32]. Beyond parameter transmission in conventional frameworks, some studies employ KD to facilitate global knowledge learning through transmitted knowledge representations [33].
To address data heterogeneity, several methods have proposed personalized FL. Unlike conventional FL, which trains a single global model, personalized FL allows each client, such as individual hospitals or medical devices, to maintain a model tailored to its local data distribution. For example, Li et al. [34] propose FedBN, which alleviates feature shift by retaining client-specific Batch Normalization (BN) layers. While all other model parameters are aggregated globally, BN layers remain localized to preserve institution-specific characteristics.
Shamsian et al. [35] develop pFedHN, a hypernetwork-based method that generates full local model parameters through a central hypernetwork. Although effective for lightweight models, its parameter count becomes prohibitive in medical imaging tasks, where models tend to be large, leading to performance degradation. Building on the ideas of pFedHN and FedBN, Li et al. [36] introduce a method that utilizes hypernetworks to generate personalized projection matrices for self-attention layers, enabling client-specific queries, keys, and values. However, this approach is restricted to Transformer-based architectures, thereby limiting its applicability across diverse model types. To improve collaboration across clients, Zhao et al. [37] propose learning individualized feature spaces, enabling the identification of models compatible with personalized knowledge sharing.
Most of the aforementioned methods are primarily developed for natural image tasks and tend to exhibit suboptimal performance when directly applied to medical imaging. As a result, designing FL algorithms explicitly adapted to the distinctive properties of medical images remains a pressing and unresolved challenge in this field.
3 Federated Learning in Medical Imaging
FL in medical imaging, as outlined above, unifies diverse institutions into a privacy‑preserving collaborative network. Unlike generic computer vision problems, medical image reconstruction and analysis are inherently physics-driven and modality-specific. For instance, variations in CT scanner protocols or MRI acquisition parameters directly influence noise patterns and reconstruction stability. Thus, mathematically modeling tasks, such as LDCT denoising or accelerated MRI reconstruction, provides the foundation for developing FL-compatible solutions that reconcile distributed data constraints with clinical fidelity requirements.
3.1 Problem Modeling
The medical image reconstruction task [38] can be formulated as:
[TABLE]
where ∥⋅∥22 denotes the L2 norm, A∈Rm×n represents the system matrix, x∈Rn is the image to be reconstructed, y∈Rm contains the measurement data, and R(⋅) denotes the regularization term, typically constructed based on prior knowledge.
Additionally, the LDCT image denoising [39] problem can be expressed as:
[TABLE]
where xl represents the LDCT image, xn corresponds to the normal-dose CT image, F(⋅,ω) denotes the target model designed to make xl approximate xn, and ω represents the parameters of the imaging network.
3.2 CT Imaging
The reduction of CT radiation dosage constitutes a critical safety measure to minimize patients’ exposure to potential X-ray hazards [48]. However, dose reduction inevitably amplifies noise and introduces artifacts that compromise image quality [49]. Consequently, recent years have witnessed substantial research efforts dedicated to developing LDCT reconstruction algorithms within FL frameworks, aiming to enhance reconstruction quality while ensuring patient safety.
Contemporary methodologies primarily address the LDCT reconstruction challenge in FL through dual optimization objectives: client-specific adaptation and global imaging feature learning. A representative approach by Yang et al. [40], termed HyperFed, leverages the intrinsic correlation between noise distribution and scanning protocols in LDCT imaging. This method employs a hypernetwork architecture to extract protocol-specific noise characteristics from scanning parameters. To counteract the detrimental effects of data heterogeneity, the framework implements client-specific domain adaptation through personalized hypernetworks that modulate the shared global reconstruction network. This modulation mechanism enables dynamic adaptation to variations in imaging devices and scanning protocols. By explicitly modeling the noise distribution disparities among clients, the approach enhances personalization while improving the generalization capability of FL-optimized LDCT reconstruction systems.
SCAN-PhysFed [41] introduces dual-level physical modeling that simultaneously addresses scanning protocol variations and patient-specific anatomical features. This framework employs large language models (LLMs) to generate anatomical prompts from radiology reports, which enable anatomy-informed hypernetworks for patient-level adaptation while explicitly modeling physical scanner parameters through orthogonal-constrained hypernetworks.
Chen et al. [42] develop FedCG to leverage multi-client data through two mechanisms: local client-level sinogram learning and cross-client image reconstruction for conditional generalization. This approach aligns measurement domain features through a conditional generalization network on the server, enabling the learning of latent shared characteristics while preserving client-specific features under different conditions.
Concurrently, Xu et al. [43] propose PerFed-LDCT, a personalized FL framework for CT reconstruction, which employs an artifact fusion network comprising two components: client-specific models for domain-specific artifact extraction and a global shared model for generalized image reconstruction. Complementing these approaches, Song et al. [44] propose Federated CycleGAN for privacy-preserving cross-domain translation, implementing domain-specific loss decomposition and AdaIN-based generators that enable decentralized CT reconstruction across heterogeneous scanner protocols without raw data sharing.
Chen et al. [45] introduce FedFDD, which uses specialized networks to separately process the high-frequency and low-frequency components of CT data to address spectral domain variations in LDCT scanning protocols. The method improves reconstruction quality by aggregating global high-frequency information while maintaining localized low-frequency characteristics.
While these methods have demonstrated promising performance, they are generally based on a critical assumption: the availability of perfectly paired LDCT and full-dose CT images [50]. However, in practical clinical scenarios, acquiring precisely matched labeled image pairs remains challenging due to factors such as patient organ motion, variations in scanning times, and equipment-related discrepancies, which consequently restrict the applicability of these approaches.
3.3 MRI Imaging
Beyond CT, MRI stands as another crucial imaging modality widely used in clinical practice. The primary objective of rapid MRI imaging is to accelerate scanning speeds while ensuring high image quality [51]. This includes reducing patient scan times, which enhances examination comfort, and leverages advanced signal processing [52] and DL techniques to reconstruct undersampled data, suppress artifacts, and improve resolution and contrast [53]. Recent advancements in federated MRI reconstruction have focused on addressing domain shifts, multi-modal harmonization, computational efficiency, and privacy preservation through integrated technical strategies.
A key challenge of MRI imaging lies in mitigating domain discrepancies caused by heterogeneous scanners and imaging protocols. Researchers have proposed several innovative strategies to address domain discrepancies through adaptive personalization. For example, Lyu et al. [54] introduce ACM-FedMRI, a dual mechanism that leverages client-specific hypernetworks to generate adaptive channel-selection weights from client ID embeddings, thereby enabling personalized feature recalibration and preserving high-frequency recovery.
Similarly, Feng et al. [55] decouple the reconstruction model into a globally shared encoder and client-specific decoders, facilitating personalized collaborative reconstruction and adaptive parameterization of decoding layers to mitigate domain shifts. In addition, Guo et al. [56] propose the FL-MRCM framework, which uses adversarial training between local reconstruction networks and a shared domain identifier to align intermediate feature distributions across source and target sites, thus achieving domain-invariant reconstruction.
Federated frameworks have also focused on multi-modal and cross-protocol harmonization by leveraging cross-modal knowledge transfer and modality synthesis. Yan et al. [57] develop Fed-PMG, which addresses modality deficiency by clustering amplitude spectra from multi-modal participants into centroids to enable missing modality synthesis through local phase retention and centroid-based amplitude interpolation.
In parallel, pFLSynth [58] tackles intra-modality heterogeneity in multi-contrast MRI synthesis by designing personalized normalization and attention blocks to adaptively modulate feature statistics for individual sites and contrast translation tasks, while partially aggregating later generator stages to balance generalization and specialization.
These two frameworks demonstrate divergent strategies: Fed-PMG emphasizes cross-modal consistency through spectral alignment, whereas pFLSynth prioritizes site- and task-specific adaptation within a unified model architecture, both aiming to enhance robustness against heterogeneous data distributions in federated medical imaging. Vertical FL frameworks, such as Fed-CRFD [59], also contribute by disentangling modality-invariant and modality-specific features through adversarial training, thereby reducing domain shifts caused by heterogeneous imaging protocols while preserving data privacy.
In response to the practical constraints of FL deployments in clinical environments, recent works have prioritized communication efficiency and lightweight architectures. FedPR [60] utilizes prompt-based learning to communicate only compact visual prompts while freezing backbone networks, significantly reducing transmission costs. Meanwhile, GAutoMRI [61] automates neural architecture search for physics-informed models using parameter-efficient dilated convolution.
FedGIMP [62] further decouples global generative priors from site-specific acquisition physics and enables collaborative training without the need to share sensitive coil sensitivity maps.
Another key trend is the incorporation of MRI physics into federated frameworks to enhance the fidelity of reconstruction. ModFed [63] integrates unfolded neural networks with MR physics priors. It uses adaptive dynamic aggregation and spatial Laplacian attention to bolster edge recovery and reconstruction performance. Elmas et al. [62] propose a two-stage approach in FedGIMP, where cross-client learning is first used to generate global MRI priors through adversarial models, which are then incorporated into the imaging network for personalized reconstruction.
Privacy preservation remains a critical concern in FL. Ahmed et al. [64] integrate differential privacy (DP) [67] with encrypted aggregation and use GAN-based generators to decouple raw data from the collaborative training process. Other methods, such as SSFedMRI [65], combine self-supervised contrastive learning [68] with lightweight MoDL (Model-based deep learning) architectures. They also use physics-based supervisory signals generated from k-space re-undersampling, which help reduce the reliance on fully sampled data. FedGraphMRI-Net [66] demonstrates a novel application of graph neural networks by partitioning MRI data into spatially coherent subgraphs through Louvain clustering, which minimizes raw data exposure while enhancing the modeling of anatomical correlations.
Levac et al. [69] further tackle data scarcity in low-data regimes by employing adaptive optimization algorithms, such as Scaffold [70] and FedAdam [71], to mitigate client drift in non-IID settings through momentum-based updates and dynamic learning rate tuning. As a result, they achieve structural similarity index metrics comparable to centralized training despite limited communication rounds.
In summary, FL introduces a collaborative paradigm for MRI reconstruction by enabling the privacy-preserving integration of multi-institutional data. By incorporating adaptive domain alignment, lightweight network architectures, and physics-informed priors, FL facilitates robust and generalizable MRI reconstruction across diverse and heterogeneous clinical settings.
4 Federated Learning in Medical Image Analysis
In smart healthcare, beyond the upstream medical imaging tasks discussed earlier, downstream medical image analysis represents another vital facet of FL in medical imaging. Based on specific clinical objectives, these analysis tasks can be broadly classified into two categories: medical image diagnosis and medical image segmentation. The following sections provide a detailed overview of several representative applications in each category. Related works are summarized in Tables 4 and 5.
4.1 Disease Detection and Diagnosis
FL frameworks have shown notable effectiveness in privacy-preserving disease diagnosis across distributed CT and MRI datasets [84, 85], which successfully addresses key challenges such as data heterogeneity and limited annotations. Early approaches adopt hybrid architectures that integrated transfer learning with FL to mitigate cross-institutional feature variability. For example, in CT-based lung cancer detection, Palash et al. [72] propose a dual-phase framework in which transfer learning is first applied to identify optimal feature extractors, such as MobileNetV2 [86], using centralized data. Subsequently, FL is employed with institution-specific preprocessing techniques, including adaptive resizing and augmentation through flipping and rotation, to align with local workflows. This decoupled strategy separates initial feature learning from federated optimization, which enables effective handling of class imbalance in rare cancer subtypes via weighted model aggregation.
For COVID-19 detection, several federated frameworks integrate neural architectures to address cross-site heterogeneity. Kumar et al. [73] introduce a blockchain-FL hybrid model that employs 3D SegCaps [87] for lesion segmentation and Capsule Networks [88] for classification, capturing spatial hierarchies of ground-glass opacities via dynamic routing. Scanner-induced variability is mitigated through spatial resampling and lung window standardization, while blockchain consensus mechanisms ensure tamper-proof gradient verification using cryptographic hashing.
In parallel, Lai et al. [74] propose a communication-efficient approach that replaces conventional weight transmission with global feature vector averaging. By incorporating contrastive learning, the method aligns client-specific CT features with class prototypes, which enhances inter-client consistency. The framework supports heterogeneous local models, including ResNet and lightweight CNNs, and addresses computational disparities across institutions by jointly optimizing cross-entropy and contrastive feature losses.
For lung cancer detection, FL frameworks integrate transfer learning for initial feature extraction with decentralized optimization for collaborative training. Blockchain-enhanced FL systems [75] enhance security by encrypting parameter exchanges and employ adaptive histogram equalization to standardize CT images acquired through diverse scanning protocols. This decoupled strategy isolates feature initialization from federated refinement and enables effective handling of class imbalance in rare cancer subtypes through weighted aggregation.
Beyond disease-specific applications, serial FL paradigms such as SiFT [76] incorporate continual learning to support multi-class classification. SiFT aligns text-guided feature projections with biomedical semantics, which addresses class imbalance in CT analysis by reinforcing semantic consistency. In brain tumor diagnosis, distributed frameworks [77] adopt EfficientNet architectures with dynamic regularization to mitigate non-IID data effects. These frameworks apply adaptive aggregation and preprocessing techniques, such as random cropping, to enhance generalization across heterogeneous clinical sites.
Federated approaches have also been extended to cross-modality tasks, which facilitates adaptation to heterogeneous data sources through techniques such as modality-specific preprocessing and domain-aware aggregation. In fMRI-based autism spectrum disorder classification, mixture-of-experts (MoE) architectures [89] combine with adversarial domain alignment dynamically adjust to scanner-specific features while ensuring privacy via gradient randomization [78].
In brain MRI anomaly detection, parameter disentanglement strategies separate global anatomical representations from client-specific intensity variations, while latent contrastive learning enforces consistency across intensity-augmented scans by aligning shape-based embeddings. Simultaneously, self-supervised inpainting reduces false positives caused by acquisition artifacts and allows collaborative learning of anatomical patterns without exposing raw data [79].
Holistic frameworks, such as FedMedICL [80], further address multi-source distribution shifts, including label imbalance and demographic variability, by incorporating class-balancing techniques and adaptive batch normalization. These strategies enhance model robustness in dynamic clinical contexts, including emerging disease scenarios like COVID-19.
For chest X‑ray scans, Kaissis et al. [81] propose PriMIA, an open‑source platform that combines secure aggregation, DP gradient descent, and secure multi‑party computation to enable encrypted training and remote inference on pediatric chest radiographs, achieving expert‑level classification performance without exposing raw imaging data. Similarly, Dayan et al. [82] develop EXAM, a client–server FL model that jointly leverages chest X‑ray and electronic medical record features from 20 international centers to predict oxygen requirements in COVID‑19 patients, demonstrating significant improvements in both accuracy and cross‑site generalizability compared to locally trained models [82]. These end‑to‑end privacy‑preserving systems underscore the potential of FL to deliver robust, secure AI‑driven diagnostics and prognostics in real‑world, multi‑institutional healthcare networks.
For classification and segmentation tasks across heterogeneous medical imaging modalities, Xie et al. [83] propose MH‑pFLGB, a model‑heterogeneous personalized federated learning framework that combines a lightweight global bypass model with a feature‑weighted fusion module to reconcile both statistical and system heterogeneity among clients with diverse network architectures, improving classification and segmentation performance without relying on public datasets.
4.2 Medical Image Segmentation
FL has emerged as a robust framework for collaborative segmentation of anatomical structures and pathological features in medical imaging [85]. By facilitating the integration of distributed annotations, FL addresses key challenges, such as partial labeling and domain shifts. In multi-organ CT segmentation, Kim et al. [90] leveraged KD and uncertainty-aware aggregation to handle incomplete labels across multiple institutions. Their dual distillation strategy preserves organ-specific features by aligning local predictions with global feature representations. A lightweight multi-head U-Net architecture with shared encoders enables simultaneous segmentation of seven abdominal structures, mitigates catastrophic forgetting in non-IID settings while maintaining computational efficiency.
Building on this foundation, subsequent approaches, such as FUNAvg [91], introducing uncertainty-aware Bayesian aggregation into multi-organ segmentation. This method facilitates implicit knowledge transfer across institutions with incompatible labeling protocols, thus enhancing segmentation consistency and robustness in heterogeneous clinical environments. Galati et al. [100] propose a two‑stage federated segmentation framework for non‑IID multi‑center data, first learning a shared disentangled latent space via adversarial and contrastive objectives, then enabling asynchronous adaptation with limited local annotations and synthesized samples, achieving stable performance across diverse imaging tasks and uneven label distributions.
Liver-related segmentation tasks further illustrate the effectiveness of FL in harmonizing heterogeneous annotations across institutions. FedDUS [92] introduces a semi-supervised federated self-supervised learning (FSSL) framework, which incorporates dynamic client weighting and FAIR principles to standardize unlabeled CT data. To address domain shifts in liver tumor segmentation, federated adaptations of nnU-Net [93] analyze inter-institutional distribution discrepancies and employ intensity normalization strategies to enhance cross-site consistency.
Hybrid cascaded models, such as Hybrid-ResUNet [94], combine 2D liver localization with 3D tumor delineation and employ transfer learning from related tumor datasets to improve model generalization for hepatocellular carcinoma segmentation. In pancreatic segmentation, Wang et al. [95] develop a federated framework designed to address domain shifts between healthy and pathological CT datasets collected from institutions in Taiwan and Japan. Their approach integrates neural architecture search with variational autoencoder regularization to align latent feature representations across institutions for robust multi-institutional segmentation.
Head and neck tumor segmentation frameworks incorporate vision transformers to capture long-range dependencies from PET/CT dual-modality data. These frameworks achieve performance comparable to centralized training through secure aggregation and DP mechanisms [96]. In cardiac CT analysis, semi-supervised FL strategies distill pseudo-labels derived from task-specific CNNs into transformer-based architectures, facilitating accurate landmark detection and calcification segmentation in sparsely annotated datasets [97].
FedDG [98] advances federated domain generalization for medical image segmentation by integrating frequency-space distribution interpolation with boundary-aware meta-learning. This framework supports collaborative training across decentralized, multi-hospital datasets while preserving data privacy by exchanging only the amplitude spectra of Fourier-transformed images and retaining the local phase spectra confidentially. The recombination of amplitude and phase components generates continuous style variations across domains, which enables the models to effectively generalize to unseen scanners and imaging protocols without exposing patient-sensitive information encoded within phase data.
FL also plays a critical role in enhancing fairness and robustness in medical image segmentation. Jiang et al. [99] propose FedCE, which dynamically reweights client contributions according to gradient- and data-space metrics and reduces performance disparities in prostate MRI segmentation across institutions with varying imaging protocols. FL has also been extended to cross‑modality synthesis tasks. In brain MRI‑to‑CT conversion, Raggio et al. [101] propose FedSynthCT‑Brain, a cross‑silo horizontal FL framework for multi‑institutional MRI‑to‑synthetic‑CT generation. By employing a residual U‑Net backbone and median‑voting fusion of axial, coronal, and sagittal predictions, their approach harmonizes heterogeneous imaging protocols without sharing raw data. Evaluation on both internal and external cohorts showed that FedSynthCT‑Brain maintains consistent image quality and anatomical accuracy across sites, demonstrating the feasibility of privacy‐preserving, federated cross‑modality synthesis in real‑world clinical settings.
These innovations collectively demonstrate the capability of FL to overcome annotation variability, data silos, and domain shifts in medical image segmentation, all while strictly adhering to privacy constraints.
5 Future Perspectives
While significant progress has been made in applying FL to medical image reconstruction and analysis in recent years, several critical challenges remain unresolved. Future research should focus on enhancing model adaptability, strengthening privacy preservation, and improving computational efficiency across diverse medical imaging tasks. This section elaborates on these challenges and outlines potential avenues for future advancement.
5.1 Special Characteristics of Medical Imaging
Although existing methods have shown encouraging results, the majority still treat LDCT as a denoising problem, often neglecting the reconstruction process itself. This limits their compatibility to iterative unrolling-based CT reconstruction networks. In medical imaging, however, image formation is fundamentally based on reconstruction from measurement domain data guided by well-established physical models [102]. Many existing frameworks fail to fully explore this measurement domain information, which constrains the optimization potential of reconstruction models. Iterative unrolling techniques, while grounded in physical modeling, typically require complex constraints and multi-step computations [103], which substantially increase model complexity and computational burden.
Within the FL paradigm, these factors significantly amplify communication overhead and local computational demand, particularly on resource-limited edge devices such as mobile platforms and compact medical terminals. Such inefficiencies present substantial barriers to scalable training and real-world deployment in clinical settings.
Bridging this gap necessitates the development of FL-compatible iterative unrolling algorithms that balance high reconstruction fidelity with reduced computational and communication costs. Equally important is the effective integration of measurement domain data into FL-driven optimization, alongside the design of efficient and privacy-preserving client-server communication protocols. These advancements are essential for evolving current methodologies into fully-fledged reconstruction frameworks and ensuring their successful translation from theoretical innovation to clinical practice.
5.2 Privacy Preservation Challenges
Although FL is inherently designed to preserve data privacy by restricting raw data to local devices, it fundamentally relies on the assumption that all participants—both clients and servers—are fully trustworthy and free of malicious intent. In practical deployments, however, this assumption often fails to hold, exposing FL systems to a range of privacy threats. These risks are primarily manifested in the following aspects:
-
While FL safeguards data privacy by transmitting only model updates, such as gradients or parameters, rather than raw data, it remains vulnerable to privacy leakage. Adversaries can exploit these updates using optimization-based reverse engineering techniques to infer client-specific inputs or latent features [104]. In particular, malicious entities may intercept gradients during transmission and reconstruct sensitive patient data, which poses a severe threat to confidentiality in FL-enabled medical systems.
2. 2.
While current efforts mainly focus on data privacy protection [105], the intellectual property of model architectures remains under-protected and vulnerable to exploitation. This issue is particularly critical in cross-institutional collaborations and commercial deployments, where proprietary model designs and trained parameters can be subject to reverse engineering, unauthorized duplication, or malicious misuse.
3. 3.
Pruned weight masks, which indicate retained and removed parameters, can inadvertently reveal client‑specific data distributions. As demonstrated by Yuan et al. [106], adversaries may infer membership of private samples by analyzing mask patterns. Furthermore, Chu et al. [107] derive information‑theoretic bounds on leakage from pruned FL models and propose PriPrune, which demonstrates that naive pruning strategies are insufficient for privacy protection. These mask‑based attacks highlight the urgent need for privacy‑preserving pruning schemes—such as randomized or encrypted masks—in federated medical imaging scenarios.
Although DP has been integrated into FL to safeguard gradient information, achieving an optimal trade-off between privacy preservation and model performance remains a persistent challenge [108]. Insufficient noise injection may fail to provide sufficient privacy protection, whereas excessive noise can significantly degrade model accuracy and hinder convergence during training.
Alternatively, cryptographic techniques such as homomorphic encryption (HE) [109] and secure multi-party computation (SMPC) [110] provide strong privacy guarantees without compromising model performance, as they enable the aggregation of gradients or parameters without requiring decryption [111]. However, these methods impose considerable computational and communication burdens [112], which can be prohibitive in resource-intensive tasks, such as medical image reconstruction, leading to prolonged training times and reduced scalability.
To address model privacy concerns, split learning has emerged as a promising alternative [113]. Nevertheless, its inherently sequential training workflow offers lower efficiency than FL, limiting its scalability in large-scale distributed systems. Recent hybrid approaches, such as SplitFed [114], aim to integrate split learning with FL to jointly preserve both data and model privacy while improving training efficiency via parallelism. However, these methods require the exchange of intermediate-layer features between clients and servers. Unlike high-level semantic tasks, medical image reconstruction is a low-level semantic task that typically avoids aggressive feature downsampling. As a result, the intermediate features are considerably larger in volume, which leads to communication overheads that far exceed those in conventional FL frameworks and potentially forms a critical bottleneck in real-world clinical deployment [115].
5.3 Security Considerations
DL algorithms, traditionally regarded as black-box models, have increasingly been identified as vulnerable to security threats such as backdoor attacks, necessitating robust safeguards in real-world deployments [116]. These concerns are further exacerbated in FL frameworks due to their inherently decentralized training paradigm. The server’s inability to access raw client data limits the effectiveness of conventional security monitoring techniques and renders the detection and mitigation of adversarial behaviors significantly more challenging.
Moreover, the intrinsic heterogeneity among FL participants, including non-identical data distributions, varying computational resources, and inconsistent compliance with security protocols, undermines the ability to ensure uniformly benign behavior across clients. This architectural limitation exposes FL training to a wide range of security vulnerabilities, including data poisoning [117] and Byzantine failures [118], both of which can compromise the integrity and stability of the aggregated global model. As a result, reconciling the dual demands of strict data privacy and robust training security has emerged as a critical research priority in advancing FL systems.
5.4 Communication Efficiency
Unlike conventional centralized training paradigms, FL operates in a distributed manner and requires careful management of communication costs throughout the training process [119]. FL relies on frequent bidirectional communication between clients and a central server to exchange model updates [120]. In the context of medical image analysis, characterized by high-dimensional data and models with large parameter counts, this leads to considerable communication overhead.
Compounding this issue, heterogeneity in client computational capabilities and unstable network conditions can cause asynchronous or delayed model uploads from certain participants. These disruptions hinder synchronized global model updates and may ultimately impede training efficiency [121]. To address these challenges, researchers have explored communication-efficient strategies, including model compression, sparse gradient transmission, and matrix factorization [122]. These techniques are essential for scaling FL across large client networks in hospital systems and ensure that the deployment of FL remains feasible and efficient in real-world clinical environments.
5.5 Scalability to Large Models
Training parameter-intensive networks, such as 3D convolutional neural networks [123] and vision transformers [124], in an FL environment imposes substantial computational and communication demands on client devices. The size of model updates can be prohibitively large, which makes naive transmission over bandwidth-constrained networks impractical [33, 125]. To address these limitations, advanced strategies such as model compression, layer-wise training, and split learning are being actively explored to enhance the scalability and practicality of FL for large-scale models.
For instance, adaptive mutual KD techniques have demonstrated the ability to reduce communication costs by over 90% with minimal impact on model accuracy [33]. Federated model compression has emerged as a key solution for alleviating bandwidth and memory constraints [126, 127], thereby facilitating efficient deployment of high-capacity models in distributed healthcare environments. For example, pruning removes redundant parameters and then reduces the size of updates transmitted by each client. Stripelis et al. [128] propose FedSparsify, a method that prunes up to 95% of model weights during federated brain age prediction without performance degradation on MRI data. Quantization compresses model weights into low‑bit formats, which cuts bandwidth consumption and accelerates local inference. Gupta et al. [129] applied int8 quantization via Quantization‑Aware Training to a 1D‑CNN for autism spectrum disorder classification on the ABIDE‑1 fMRI dataset, enabling edge deployment with minimal accuracy loss. Low‑Rank Adaptation attaches lightweight trainable adapters to a frozen backbone, which limits the size of transmitted updates to only a few megabytes. For instance, Zhu et al. [130] develop MeLo, which adds only 0.17% trainable parameters via LoRA to ViT models and achieves state‑of‑the‑art performance across multiple medical diagnosis tasks.
Knowledge distillation can be used to pre-train a compact student model prior to federated training, which reduces the resource requirements for initial deployment. Kim et al. [90] used federated KD for multi‑organ CT segmentation on partially labeled datasets, which regularizes local training with global and organ‑specific teachers to improve accuracy with reduced student model size. Collectively, these techniques reduce network traffic and computational demands during federated rounds, and enable efficient training and real-time inference across heterogeneous hospital hardware.
Looking ahead, it will be essential to co‑optimize diffusion model compression with PACS bandwidth constraints, so that low‑precision or pruned diffusion priors can be streamed over DICOM networks without exceeding institutional data transfer limits. Real‑world edge deployments already demonstrate the potential of compressed models on resource‑constrained hardware platforms. Recent medical imaging case studies further validate this potential. Kromer et al. [131] use lightweight CNNs (e.g., 6.25 MiB models) to achieve over 96% diagnostic accuracy for COVID-19 detection on embedded NVIDIA Jetson modules, while optimizing energy-time tradeoffs to under 29W power consumption at 26.3 FPS inference rates. Blazeneo et al. [132] introduce BlazeNeo for real-time polyp segmentation and neoplasm detection, and achieve over 155 FPS in INT8 precision on a Jetson AGX Xavier, while maintaining state‑of‑the‑art performance. Zhang et al. [133] adapt hierarchical vision foundation models for real‑time ultrasound image segmentation, achieving 77 FPS inference with TensorRT on a single A100 GPU and reporting a mean Dice score exceeding 90% across six public and one in‑house dataset. Extending these case studies to federated, diffusion‑based reconstruction tasks will require end‑to‑end hardware–software co‑design frameworks that jointly satisfy compression, latency, and clinical workflow requirements.
5.6 Post-Deployment Adaptation
Deployed AI models in clinical settings must remain adaptive to evolving data distributions, driven by shifts in patient demographics, the introduction of new imaging devices, and updates to clinical protocols—factors that can lead to significant performance degradation due to distribution shifts. Federated Continual Learning provides a privacy-preserving framework for lifelong model updates across institutions [134, 135]. In this setting, each client periodically retrains on newly acquired local data and contributes model updates to a global model without disclosing sensitive patient information.
Crucially, recent studies highlight the importance of quantifying model uncertainty and detecting post-deployment distribution shifts, as exposure to unseen data can markedly compromise model reliability [136]. To promote robust and adaptive deployment, future research should prioritize integrating FL with domain adaptation and uncertainty estimation techniques, which enable models to autonomously recalibrate or retrain in response to novel clinical scenarios.
5.7 Clinical Adoption Barriers
The transition from validated federated models to clinical deployment encounters critical non-technical barriers. Hospital ethics committees (HECs) rigorously evaluate compliance with patient consent protocols, enforce data minimization principles, and assess the risk-benefit trade-offs involved in federated model training and deployment workflows. Simultaneously, data use agreements (DUAs) often require protracted inter-institutional negotiations, as they specify permissible data processing purposes, data retention periods, access rights, and ownership of jointly trained models, while ensuring compliance with regulations like the General Data Protection Regulation (GDPR) [137] and the Health Insurance Portability and Accountability Act (HIPAA) [138]. These administrative processes often take 6 to 12 months, significantly delaying real-world validation.
Operational challenges further impede adoption. The allocation of liability for diagnostic errors remains ambiguous in FL, complicating institutional accountability and commitments. Cybersecurity certifications for hospital integration demand rigorous penetration testing of servers and encrypted communication channels [139]. Seamless integration with clinical systems, such as Picture Archiving and Communication System (PACS), necessitates standardized interfaces to avoid workflow disruption. Clinician acceptance depends on transparent model interpretations for edge cases and proofable consistency in performance across clinical sites [140]. Addressing these challenges through auditable federated architectures, standardized agreement templates, and clinician-centered interpretability tools is essential for transforming federated medical AI from a technical achievement to a clinically valuable asset.
6 Conclusion
FL has demonstrated significant promise for both the development and deployment of large-scale AI models for medical imaging. By transmitting only model updates, such as gradients or weight differences, rather than raw patient scans, FL enables collaborative training of AI models across various imaging modalities, including CT, MRI, and PET/CT. This approach not only safeguards patient privacy and adheres to regulatory constraints but also capitalizes on heterogeneous patient populations and imaging protocols that would be infeasible for any single institution to collect independently. Focusing on two core tasks in smart healthcare — medical image reconstruction and analysis, this review surveys recent FL-based methodologies, highlighting their strategies to handle non-IID data distributions, limited local data volumes, and the need for robust aggregation. At the same time, pronounced data heterogeneity and scarcity in medical imaging pose distinctive implementation challenges that often hinder model convergence and generalization. Accordingly, we systematically analyze these critical challenges and outline potential avenues to address them.
Sustaining the post-deployment performance of large models necessitates lightweight yet robust update mechanisms capable of incorporating new clinical cases and adapting to evolving imaging standards, all while minimizing downtime and mitigating the risk of data leakage. Future efforts should prioritize communication-efficient aggregation strategies such as sparse or quantized updates, model-parallel training schemes tailored for resource-limited sites, and federated continual learning protocols that detect distribution shifts and trigger safe model retraining. By addressing these engineering and governance challenges, FL can transition from proof-of-concept studies into a reliable framework for the responsible deployment and maintenance of next-generation AI systems across diverse healthcare settings.
Acknowledgment
The authors declare that they have no known conflicts of interest in terms of competing
financial interests or personal relationships that could have an influence or are relevant
to the work reported in this paper.
Reference