Flexible Coded Distributed Convolution Computing for Enhanced Straggler Resilience and Numerical Stability in Distributed CNNs
Shuo Tan, Rui Liu, Xuesong Han, XianLei Long, Kai Wan, Linqi Song, Yong Li

TL;DR
This paper presents FCDCC, a novel framework that improves distributed CNN performance by enhancing straggler resilience and numerical stability through advanced coded tensor convolution techniques and partitioning schemes.
Contribution
It introduces the FCDCC framework with new coded partitioning schemes and extends CDC with CRME for tensor convolution, improving robustness and efficiency in distributed CNNs.
Findings
Enhanced straggler resilience demonstrated in experiments.
Improved numerical stability in distributed CNN computations.
Scalability across various CNN architectures validated.
Abstract
Deploying Convolutional Neural Networks (CNNs) on resource-constrained devices necessitates efficient management of computational resources, often via distributed environments susceptible to latency from straggler nodes. This paper introduces the Flexible Coded Distributed Convolution Computing (FCDCC) framework to enhance straggler resilience and numerical stability in distributed CNNs. We extend Coded Distributed Computing (CDC) with Circulant and Rotation Matrix Embedding (CRME) which was originally proposed for matrix multiplication to high-dimensional tensor convolution. For the proposed scheme, referred to as the Numerically Stable Coded Tensor Convolution (NSCTC) scheme, we also propose two new coded partitioning schemes: Adaptive-Padding Coded Partitioning (APCP) for the input tensor and Kernel-Channel Coded Partitioning (KCCP) for the filter tensor. These strategies enable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Adversarial Robustness in Machine Learning · Brain Tumor Detection and Classification
MethodsConvolution
