An Optimal Time Variable Learning Framework for Deep Neural Networks
Harbir Antil, Hugo D\'iaz, Evelyn Herberg

TL;DR
This paper introduces a novel framework for deep neural networks that learns layer-specific time step-sizes to improve stability and performance, applicable to various architectures and tested on complex physical equations.
Contribution
It proposes a method to learn variable discretization parameters across layers, enhancing stability and addressing gradient issues in deep networks.
Findings
Overcomes vanishing and exploding gradients.
Applicable to ResNet, DenseNet, Fractional-DNN.
Improves stability in complex physical simulations.
Abstract
Feature propagation in Deep Neural Networks (DNNs) can be associated to nonlinear discrete dynamical systems. The novelty, in this paper, lies in letting the discretization parameter (time step-size) vary from layer to layer, which needs to be learned, in an optimization framework. The proposed framework can be applied to any of the existing networks such as ResNet, DenseNet or Fractional-DNN. This framework is shown to help overcome the vanishing and exploding gradient issues. Stability of some of the existing continuous DNNs such as Fractional-DNN is also studied. The proposed approach is applied to an ill-posed 3D-Maxwell's equation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Machine Learning and ELM
MethodsBatch Normalization · Kaiming Initialization · Residual Connection · Dense Connections · Max Pooling · 1x1 Convolution · Convolution · Softmax · Residual Block · Bottleneck Residual Block
