Distributed Learning and Inference Systems: A Networking Perspective

Hesham G. Moussa; Arashmid Akhavain; S. Maryam Hosseini; Bill McCormick

arXiv:2501.05323·cs.LG·June 2, 2025

Distributed Learning and Inference Systems: A Networking Perspective

Hesham G. Moussa, Arashmid Akhavain, S. Maryam Hosseini, Bill McCormick

PDF

Open Access

TL;DR

This paper discusses the shift from centralized to distributed AI systems, proposing a novel framework called DA-ITN to address the complexities and challenges of decentralized machine learning and inference.

Contribution

It introduces the DA-ITN framework, a new approach for distributed AI training and inference that tackles the complexities of decentralized systems.

Findings

01

Proposes the DA-ITN framework for distributed AI

02

Explores components and functions of DA-ITN

03

Highlights challenges and research directions in distributed AI

Abstract

Machine learning models have achieved, and in some cases surpassed, human-level performance in various tasks, mainly through centralized training of static models and the use of large models stored in centralized clouds for inference. However, this centralized approach has several drawbacks, including privacy concerns, high storage demands, a single point of failure, and significant computing requirements. These challenges have driven interest in developing alternative decentralized and distributed methods for AI training and inference. Distribution introduces additional complexity, as it requires managing multiple moving parts. To address these complexities and fill a gap in the development of distributed AI systems, this work proposes a novel framework, Data and Dynamics-Aware Inference and Training Networks (DA-ITN). The different components of DA-ITN and their functions are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Access Control and Trust · Semantic Web and Ontologies