# Analyzing the benefits of communication channels between deep learning   models

**Authors:** Philippe Lacaille

arXiv: 1904.09211 · 2019-04-22

## TL;DR

This paper investigates how different communication strategies between deep learning models impact training efficiency and performance, including low-bandwidth exchanges, training instruction sharing, and language exchange.

## Contribution

It introduces novel analyses of communication protocols between models, including low-bandwidth exchanges, training instruction sharing, and language exchange, highlighting their effects on training performance.

## Key findings

- Low-bandwidth output exchanges can improve training efficiency.
- Sharing training instructions accelerates learning in pre-trained models.
- Exchanging purposefully crafted languages enables new communication methods.

## Abstract

As artificial intelligence systems spread to more diverse and larger tasks in many domains, the machine learning algorithms, and in particular the deep learning models and the databases required to train them are getting bigger themselves. Some algorithms do allow for some scaling of large computations by leveraging data parallelism. However, they often require a large amount of data to be exchanged in order to ensure the shared knowledge throughout the compute nodes is accurate.   In this work, the effect of different levels of communications between deep learning models is studied, in particular how it affects performance. The first approach studied looks at decentralizing the numerous computations that are done in parallel in training procedures such as synchronous and asynchronous stochastic gradient descent. In this setting, a simplified communication that consists of exchanging low bandwidth outputs between compute nodes can be beneficial. In the following chapter, the communication protocol is slightly modified to further include training instructions. Indeed, this is studied in a simplified setup where a pre-trained model, analogous to a teacher, can customize a randomly initialized model's training procedure to accelerate learning. Finally, a communication channel where two deep learning models can exchange a purposefully crafted language is explored while allowing for different ways of optimizing that language.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.09211/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/1904.09211/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/1904.09211/full.md

---
Source: https://tomesphere.com/paper/1904.09211