# Federated Learning for Histopathology Image Classification: A Systematic Review

**Authors:** Meriem Touhami, Mohammad Faizal Ahmad Fauzi, Zaka Ur Rehman, Sarina Mansor

PMC · DOI: 10.3390/diagnostics16010137 · Diagnostics · 2026-01-01

## TL;DR

This paper reviews how federated learning is used to classify histopathology images, balancing privacy and model performance while identifying key challenges and future directions.

## Contribution

A systematic review of federated learning applications in histopathology image classification, highlighting methodologies, datasets, and performance trends.

## Key findings

- 24 studies were analyzed, showing classification accuracies ranging from 69.37% to 99.72% using FL.
- FedAvg was the most common aggregation algorithm, with VGG, ResNet, and similar models frequently used.
- Key challenges include communication overhead, computational demands, and inconsistent reporting standards.

## Abstract

Background/Objective: The integration of machine learning (ML) and deep learning (DL) has significantly enhanced medical image classification, especially in histopathology, by improving diagnostic accuracy and aiding clinical decision making. However, data privacy concerns and restrictions on sharing patient data limit the development of effective DL models. Federated learning (FL) offers a promising solution by enabling collaborative model training across institutions without exposing sensitive data. This systematic review aims to comprehensively evaluate the current state of FL applications in histopathological image classification by identifying prevailing methodologies, datasets, and performance metrics and highlighting existing challenges and future research directions. Methods: Following PRISMA guidelines, 24 studies published between 2020 and 2025 were analyzed. The literature was retrieved from ScienceDirect, IEEE Xplore, MDPI, Springer Nature Link, PubMed, and arXiv. Eligible studies focused on FL-based deep learning models for histopathology image classification with reported performance metrics. Studies unrelated to FL in histopathology or lacking accessible full texts were excluded. Results: The included studies utilized 10 datasets (8 public, 1 private, and 1 unspecified) and reported classification accuracies ranging from 69.37% to 99.72%. FedAvg was the most commonly used aggregation algorithm (14 studies), followed by FedProx, FedDropoutAvg, and custom approaches. Only two studies reported their FL frameworks (Flower and OpenFL). Frequently employed model architectures included VGG, ResNet, DenseNet, and EfficientNet. Performance was typically evaluated using accuracy, precision, recall, and F1-score. Federated learning demonstrates strong potential for privacy-preserving digital pathology applications. However, key challenges remain, including communication overhead, computational demands, and inconsistent reporting standards. Addressing these issues is essential for broader clinical adoption. Conclusions: Future work should prioritize standardized evaluation protocols, efficient aggregation methods, model personalization, robustness, and interpretability, with validation across multi-institutional clinical environments to fully realize the benefits of FL in histopathological image classification.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12785327/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12785327/full.md

## References

63 references — full list in the complete paper: https://tomesphere.com/paper/PMC12785327/full.md

---
Source: https://tomesphere.com/paper/PMC12785327