Reversible designs for extreme memory cost reduction of CNN training

Tristan Hascoet; Quentin Febvre; Yasuo Ariki; Tetsuya Takiguchi

arXiv:1910.11127·cs.CV·October 25, 2019

Reversible designs for extreme memory cost reduction of CNN training

Tristan Hascoet, Quentin Febvre, Yasuo Ariki, Tetsuya Takiguchi

PDF

TL;DR

This paper introduces reversible neural network designs that drastically reduce memory usage during CNN training, enabling efficient training on devices with very limited memory.

Contribution

The paper proposes a novel reversible architecture with minimal memory footprint, capable of training deep CNNs on low-memory hardware.

Findings

01

Achieved 93.3% accuracy on CIFAR10

02

Memory cost reduced to 352 bytes per input pixel

03

Trained on a GTX750 GPU with only 1GB memory

Abstract

Training Convolutional Neural Networks (CNN) is a resource intensive task that requires specialized hardware for efficient computation. One of the most limiting bottleneck of CNN training is the memory cost associated with storing the activation values of hidden layers needed for the computation of the weights gradient during the backward pass of the backpropagation algorithm. Recently, reversible architectures have been proposed to reduce the memory cost of training large CNN by reconstructing the input activation values of hidden layers from their output during the backward pass, circumventing the need to accumulate these activations in memory during the forward pass. In this paper, we push this idea to the extreme and analyze reversible network designs yielding minimal training memory footprint. We investigate the propagation of numerical errors in long chains of invertible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.