Reversible Architectures for Arbitrarily Deep Residual Neural Networks

Bo Chang; Lili Meng; Eldad Haber; Lars Ruthotto; David Begert and; Elliot Holtham

arXiv:1709.03698·cs.CV·November 21, 2017·71 cites

Reversible Architectures for Arbitrarily Deep Residual Neural Networks

Bo Chang, Lili Meng, Eldad Haber, Lars Ruthotto, David Begert and, Elliot Holtham

PDF

Open Access 2 Repos

TL;DR

This paper introduces reversible neural network architectures inspired by differential equations, enabling arbitrarily deep, memory-efficient, and stable networks that achieve state-of-the-art results on image classification tasks.

Contribution

The paper develops a theoretical framework for reversible neural networks based on ODE interpretation, leading to new architectures that are deep, stable, and memory-efficient.

Findings

01

Achieved state-of-the-art or competitive results on CIFAR-10, CIFAR-100, and STL-10 datasets.

02

Demonstrated memory efficiency and stability in training very deep networks.

03

Showed improved performance with fewer training data.

Abstract

Recently, deep residual networks have been successfully applied in many computer vision and natural language processing tasks, pushing the state-of-the-art performance with deeper and wider architectures. In this work, we interpret deep residual networks as ordinary differential equations (ODEs), which have long been studied in mathematics and physics with rich theoretical and empirical success. From this interpretation, we develop a theoretical framework on stability and reversibility of deep neural networks, and derive three reversible neural network architectures that can go arbitrarily deep in theory. The reversibility property allows a memory-efficient implementation, which does not need to store the activations for most hidden layers. Together with the stability of our architectures, this enables training deeper networks using only modest computational resources. We provide both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning