# CAMR: Coded Aggregated MapReduce

**Authors:** Konstantinos Konstantinidis, Aditya Ramamoorthy

arXiv: 1901.07418 · 2021-11-02

## TL;DR

This paper introduces CAMR, a scheme for distributed algorithms like deep learning that reduces communication load without exponentially increasing the number of jobs or data splits, improving scalability.

## Contribution

CAMR achieves optimal communication load in distributed algorithms while maintaining a small number of jobs and data splits, unlike prior exponential-growth methods.

## Key findings

- Reduces communication load in MapReduce-like systems.
- Maintains a small number of jobs and data splits.
- Achieves load reduction comparable to state-of-the-art methods.

## Abstract

Many big data algorithms executed on MapReduce-like systems have a shuffle phase that often dominates the overall job execution time. Recent work has demonstrated schemes where the communication load in the shuffle phase can be traded off for the computation load in the map phase. In this work, we focus on a class of distributed algorithms, broadly used in deep learning, where intermediate computations of the same task can be combined. Even though prior techniques reduce the communication load significantly, they require a number of jobs that grows exponentially in the system parameters. This limitation is crucial and may diminish the load gains as the algorithm scales. We propose a new scheme which achieves the same load as the state-of-the-art while ensuring that the number of jobs as well as the number of subfiles that the data set needs to be split into remain small.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.07418/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1901.07418/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/1901.07418/full.md

---
Source: https://tomesphere.com/paper/1901.07418