Coded Computing for Resilient Distributed Computing: A Learning-Theoretic Framework
Parsa Moradi, Behrooz Tahmasebi, Mohammad Ali Maddah-Ali

TL;DR
This paper introduces a learning-theoretic framework for coded computing in distributed machine learning, optimizing encoder and decoder functions to improve resilience and accuracy in the presence of slow or faulty servers.
Contribution
It develops a novel learning-based approach to coded computing, bridging the gap between coding theory and machine learning workloads, with explicit optimal encoder-decoder derivation.
Findings
Error decay rate improves with number of workers
Framework outperforms state-of-the-art in accuracy
Effective in noisy and noiseless settings
Abstract
Coded computing has emerged as a promising framework for tackling significant challenges in large-scale distributed computing, including the presence of slow, faulty, or compromised servers. In this approach, each worker node processes a combination of the data, rather than the raw data itself. The final result then is decoded from the collective outputs of the worker nodes. However, there is a significant gap between current coded computing approaches and the broader landscape of general distributed computing, particularly when it comes to machine learning workloads. To bridge this gap, we propose a novel foundation for coded computing, integrating the principles of learning theory, and developing a framework that seamlessly adapts with machine learning applications. In this framework, the objective is to find the encoder and decoder functions that minimize the loss function, defined…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsOnline Learning and Analytics · Innovative Teaching and Learning Methods
