Progressive Size-Adaptive Federated Learning: A Comprehensive Framework for Heterogeneous Multi-Modal Data Systems

Sajid Hussain; Muhammad Sohail; Nauman Ali Khan; Naima Iltaf; and Ihtesham ul Islam

arXiv:2506.20685·cs.LG·September 23, 2025

Progressive Size-Adaptive Federated Learning: A Comprehensive Framework for Heterogeneous Multi-Modal Data Systems

Sajid Hussain, Muhammad Sohail, Nauman Ali Khan, Naima Iltaf, and Ihtesham ul Islam

PDF

Open Access

TL;DR

This paper introduces Size-Based Adaptive Federated Learning (SAFL), a progressive framework that optimizes federated learning by considering dataset size and modality, leading to improved accuracy, efficiency, and insights into training dynamics across diverse data types.

Contribution

The paper presents a novel size-adaptive federated learning framework that systematically organizes training based on dataset size and modality, filling a gap in existing FL research.

Findings

01

Optimal dataset size range of 1000-1500 samples for FL effectiveness

02

Structured data modalities outperform unstructured ones in FL

03

Performance degrades for datasets exceeding 2000 samples

Abstract

Federated Learning (FL) has emerged as a transformative paradigm for distributed machine learning while preserving data privacy. However, existing approaches predominantly focus on model heterogeneity and aggregation techniques, largely overlooking the fundamental impact of dataset size characteristics on federated training dynamics. This paper introduces Size-Based Adaptive Federated Learning (SAFL), a novel progressive training framework that systematically organizes federated learning based on dataset size characteristics across heterogeneous multi-modal data. Our comprehensive experimental evaluation across 13 diverse datasets spanning 7 modalities (vision, text, time series, audio, sensor, medical vision, and multimodal) reveals critical insights: 1) an optimal dataset size range of 1000-1500 samples for federated learning effectiveness; 2) a clear modality performance hierarchy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Stochastic Gradient Optimization Techniques

MethodsFocus