PETRA: Parallel End-to-end Training with Reversible Architectures
St\'ephane Rivaud (MLIA, TAU), Louis Fournier (MLIA), Thomas Pumir, Eugene Belilovsky (MILA), Michael Eickenberg, Edouard Oyallon

TL;DR
PETRA introduces a reversible architecture-based method that enables efficient parallel training of deep neural networks by decoupling forward and backward passes, reducing memory usage, and eliminating weight stashing.
Contribution
The paper presents PETRA, a novel reversible architecture framework that improves parallelization in deep learning training without sacrificing accuracy.
Findings
Achieves competitive accuracy on CIFAR-10, ImageNet32, and ImageNet datasets.
Enables independent stage computation across devices, reducing communication overhead.
Removes the need for weight stashing during training.
Abstract
Reversible architectures have been shown to be capable of performing on par with their non-reversible architectures, being applied in deep learning for memory savings and generative modeling. In this work, we show how reversible architectures can solve challenges in parallelizing deep model training. We introduce PETRA, a novel alternative to backpropagation for parallelizing gradient computations. PETRA facilitates effective model parallelism by enabling stages (i.e., a set of layers) to compute independently on different devices, while only needing to communicate activations and gradients between each other. By decoupling the forward and backward passes and keeping a single updated version of the parameters, the need for weight stashing is also removed. We develop a custom autograd-like training framework for PETRA, and we demonstrate its effectiveness on CIFAR-10, ImageNet32, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Advanced Image and Video Retrieval Techniques
MethodsSparse Evolutionary Training
