An Optimal Time Variable Learning Framework for Deep Neural Networks

Harbir Antil; Hugo D\'iaz; Evelyn Herberg

arXiv:2204.08528·math.OC·April 20, 2022

An Optimal Time Variable Learning Framework for Deep Neural Networks

Harbir Antil, Hugo D\'iaz, Evelyn Herberg

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel framework for deep neural networks that learns layer-specific time step-sizes to improve stability and performance, applicable to various architectures and tested on complex physical equations.

Contribution

It proposes a method to learn variable discretization parameters across layers, enhancing stability and addressing gradient issues in deep networks.

Findings

01

Overcomes vanishing and exploding gradients.

02

Applicable to ResNet, DenseNet, Fractional-DNN.

03

Improves stability in complex physical simulations.

Abstract

Feature propagation in Deep Neural Networks (DNNs) can be associated to nonlinear discrete dynamical systems. The novelty, in this paper, lies in letting the discretization parameter (time step-size) vary from layer to layer, which needs to be learned, in an optimization framework. The proposed framework can be applied to any of the existing networks such as ResNet, DenseNet or Fractional-DNN. This framework is shown to help overcome the vanishing and exploding gradient issues. Stability of some of the existing continuous DNNs such as Fractional-DNN is also studied. The proposed approach is applied to an ill-posed 3D-Maxwell's equation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

frederikkoehne/time_variable_learning
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Machine Learning and ELM

MethodsBatch Normalization · Kaiming Initialization · Residual Connection · Dense Connections · Max Pooling · 1x1 Convolution · Convolution · Softmax · Residual Block · Bottleneck Residual Block