# Coded convolution for parallel and distributed computing within a   deadline

**Authors:** Sanghamitra Dutta, Viveck Cadambe, Pulkit Grover

arXiv: 1705.03875 · 2017-05-11

## TL;DR

This paper introduces a coding-based approach to distributed convolution that enhances resilience against slow or faulty processors, significantly increasing the probability of completing computations within strict deadlines.

## Contribution

It develops a novel asymptotic failure exponent analysis for coded distributed computing, providing closed-form expressions and improved deadline reliability over traditional methods.

## Key findings

- Coding improves resilience against stragglers in distributed convolution.
- The method dramatically increases the probability of meeting deadline constraints.
- Closed-form expressions are derived for various computation time models.

## Abstract

We consider the problem of computing the convolution of two long vectors using parallel processing units in the presence of "stragglers". Stragglers refer to the small fraction of faulty or slow processors that delays the entire computation in time-critical distributed systems. We first show that splitting the vectors into smaller pieces and using a linear code to encode these pieces provides better resilience against stragglers than replication-based schemes under a simple, worst-case straggler analysis. We then demonstrate that under commonly used models of computation time, coding can dramatically improve the probability of finishing the computation within a target "deadline" time. As opposed to the more commonly used technique of expected computation time analysis, we quantify the exponents of the probability of failure in the limit of large deadlines. Our exponent metric captures the probability of failing to finish before a specified deadline time, i.e. , the behavior of the "tail". Moreover, our technique also allows for simple closed form expressions for more general models of computation time, e.g. shifted Weibull models instead of only shifted exponentials. Thus, through this problem of coded convolution, we establish the utility of a novel asymptotic failure exponent analysis for distributed systems.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.03875/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1705.03875/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1705.03875/full.md

---
Source: https://tomesphere.com/paper/1705.03875