CoCoI: Distributed Coded Inference System for Straggler Mitigation
Xing Liu, Chao Huang, Ming Tang

TL;DR
CoCoI is a distributed coded inference system for CNNs that mitigates stragglers and device failures, reducing latency by up to 34.2% through adaptive redundancy and optimal task splitting.
Contribution
It introduces a novel distributed coded inference framework for CNNs that considers data dependencies and optimizes task splitting to improve robustness and reduce latency.
Findings
Reduces inference latency by up to 34.2% with stragglers.
Effectively mitigates device failures in distributed CNN inference.
Approximate strategy closely matches the optimal solution in experiments.
Abstract
Convolutional neural networks (CNNs) are widely applied in real-time applications on resource-constrained devices. To accelerate CNN inference, prior works proposed to distribute the inference workload across multiple devices. However, they did not address stragglers and device failures in distributed inference, which is challenging due to the devices' time-varying and possibly unknown computation/communication capacities. To address this, we propose a distributed coded inference system, called CoCoI. It splits the convolutional layers of CNN, considering the data dependency of high-dimensional inputs and outputs, and then adapts coding schemes to generate task redundancy. With CoCoI, the inference results can be determined once a subset of devices complete their subtasks, improving robustness against stragglers and failures. To theoretically analyze the tradeoff between redundancy and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
