Federated Learning Framework for Scalable AI in Heterogeneous HPC and Cloud Environments
Sangam Ghimire, Paribartan Timalsina, Nirjal Bhurtel, Bishal Neupane, Bigyan Byanju Shrestha, Subarna Bhattarai, Prajwal Gaire, Jessica Thapa, Sudan Jha

TL;DR
This paper introduces a federated learning framework designed for efficient, scalable AI deployment across heterogeneous HPC and cloud environments, addressing challenges like system diversity, communication costs, and data privacy.
Contribution
It presents a novel federated learning system optimized for mixed HPC and cloud infrastructures, improving scalability, fault tolerance, and convergence under diverse hardware and data conditions.
Findings
Demonstrates high scalability and fault tolerance in hybrid environments.
Achieves effective model convergence with non-IID data distributions.
Maintains data privacy while optimizing communication overhead.
Abstract
As the demand grows for scalable and privacy-aware AI systems, Federated Learning (FL) has emerged as a promising solution, allowing decentralized model training without moving raw data. At the same time, the combination of high-performance computing (HPC) and cloud infrastructure offers vast computing power but introduces new complexities, especially when dealing with heterogeneous hardware, communication limits, and non-uniform data. In this work, we present a federated learning framework built to run efficiently across mixed HPC and cloud environments. Our system addresses key challenges such as system heterogeneity, communication overhead, and resource scheduling, while maintaining model accuracy and data privacy. Through experiments on a hybrid testbed, we demonstrate strong performance in terms of scalability, fault tolerance, and convergence, even under non-Independent and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Big Data and Digital Economy · IoT and Edge/Fog Computing
