Self-Supervised Federated Learning under Data Heterogeneity for Label-Scarce Diatom Classification
Mingkun Tan, Xilu Wang, Michael Kloster, Tim W. Nattkemper

TL;DR
This paper investigates self-supervised federated learning for diatom classification, focusing on data heterogeneity in unlabeled data volume and label-space, proposing new partitioning schemes and adaptive methods to improve performance.
Contribution
It introduces PreDi for controllable heterogeneity simulation and PreP-WFL for adaptive class representation enhancement, advancing understanding of heterogeneity effects in federated learning.
Findings
Heterogeneity in unlabeled data volume improves pre-training.
Prevalence dominates performance under label-space heterogeneity.
PreP-WFL mitigates performance degradation in low-prevalence scenarios.
Abstract
Label-scarce visual classification under decentralized and heterogeneous data is a fundamental challenge in pattern recognition, especially when sites exhibit partially overlapping class sets. While self-supervised federated learning (SSFL) offers a promising solution, existing studies commonly assume the same data heterogeneity pattern throughout pre-training and fine-tuning. Moreover, current partitioning schemes often fail to generate pure partially class-disjoint data settings, limiting controllable simulation of real-world label-space heterogeneity. In this work, we introduce SSFL for diatom classification as a representative real-world instance and systematically investigate stage-specific data heterogeneity. We study cross-site variation in unlabeled data volume during pre-training and label-space misalignment during downstream fine-tuning. To study the latter in a controllable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
