Reversible designs for extreme memory cost reduction of CNN training
Tristan Hascoet, Quentin Febvre, Yasuo Ariki, Tetsuya Takiguchi

TL;DR
This paper introduces reversible neural network designs that drastically reduce memory usage during CNN training, enabling efficient training on devices with very limited memory.
Contribution
The paper proposes a novel reversible architecture with minimal memory footprint, capable of training deep CNNs on low-memory hardware.
Findings
Achieved 93.3% accuracy on CIFAR10
Memory cost reduced to 352 bytes per input pixel
Trained on a GTX750 GPU with only 1GB memory
Abstract
Training Convolutional Neural Networks (CNN) is a resource intensive task that requires specialized hardware for efficient computation. One of the most limiting bottleneck of CNN training is the memory cost associated with storing the activation values of hidden layers needed for the computation of the weights gradient during the backward pass of the backpropagation algorithm. Recently, reversible architectures have been proposed to reduce the memory cost of training large CNN by reconstructing the input activation values of hidden layers from their output during the backward pass, circumventing the need to accumulate these activations in memory during the forward pass. In this paper, we push this idea to the extreme and analyze reversible network designs yielding minimal training memory footprint. We investigate the propagation of numerical errors in long chains of invertible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
