Locally adaptive activation functions with slope recovery term for deep   and physics-informed neural networks

Ameya D. Jagtap; Kenji Kawaguchi; George Em Karniadakis

arXiv:1909.12228·cs.LG·April 28, 2021

Locally adaptive activation functions with slope recovery term for deep and physics-informed neural networks

Ameya D. Jagtap, Kenji Kawaguchi, George Em Karniadakis

PDF

TL;DR

This paper introduces locally adaptive activation functions with a slope recovery term for deep and physics-informed neural networks, enhancing training speed, convergence, and avoiding sub-optimal critical points through theoretical and empirical analysis.

Contribution

It presents novel layer-wise and neuron-wise adaptive activation functions with a slope recovery term, improving training efficiency and convergence in neural networks.

Findings

01

Accelerated training convergence with slope recovery term.

02

Theoretical proof of avoiding sub-optimal critical points.

03

Implicit conditioning matrices improve optimization dynamics.

Abstract

We propose two approaches of locally adaptive activation functions namely, layer-wise and neuron-wise locally adaptive activation functions, which improve the performance of deep and physics-informed neural networks. The local adaptation of activation function is achieved by introducing a scalable parameter in each layer (layer-wise) and for every neuron (neuron-wise) separately, and then optimizing it using a variant of stochastic gradient descent algorithm. In order to further increase the training speed, an activation slope based slope recovery term is added in the loss function, which further accelerates convergence, thereby reducing the training cost. On the theoretical side, we prove that in the proposed method, the gradient descent algorithms are not attracted to sub-optimal critical points or local minima under practical conditions on the initialization and learning rate, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.