# "Parallel Training Considered Harmful?": Comparing series-parallel and   parallel feedforward network training

**Authors:** Ant\^onio H. Ribeiro, Luis A. Aguirre

arXiv: 1706.07119 · 2019-05-06

## TL;DR

This paper compares series-parallel and parallel training methods for neural networks in dynamic systems, showing that parallel training can be more robust and suitable for realistic scenarios, with similar computational costs but better parallelization.

## Contribution

It provides a unified framework and simulation study comparing both training methods, highlighting situations where parallel training outperforms series-parallel training.

## Key findings

- Parallel training is more robust in noisy conditions.
- Both methods have similar computational costs.
- Series-parallel training is more amenable to parallelization.

## Abstract

Neural network models for dynamic systems can be trained either in parallel or in series-parallel configurations. Influenced by early arguments, several papers justify the choice of series-parallel rather than parallel configuration claiming it has a lower computational cost, better stability properties during training and provides more accurate results. Other published results, on the other hand, defend parallel training as being more robust and capable of yielding more accu- rate long-term predictions. The main contribution of this paper is to present a study comparing both methods under the same unified framework. We focus on three aspects: i) robustness of the estimation in the presence of noise; ii) computational cost; and, iii) convergence. A unifying mathematical framework and simulation studies show situations where each training method provides better validation results, being parallel training better in what is believed to be more realistic scenarios. An example using measured data seems to reinforce such claim. We also show, with a novel complexity analysis and numerical examples, that both methods have similar computational cost, being series series-parallel training, however, more amenable to parallelization. Some informal discussion about stability and convergence properties is presented and explored in the examples.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.07119/full.md

## Figures

18 figures with captions in the complete paper: https://tomesphere.com/paper/1706.07119/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/1706.07119/full.md

---
Source: https://tomesphere.com/paper/1706.07119