A Framework for Provably Stable and Consistent Training of Deep   Feedforward Networks

Arunselvan Ramaswamy; Shalabh Bhatnagar; Naman Saxena

arXiv:2305.12125·cs.LG·May 23, 2023·1 cites

A Framework for Provably Stable and Consistent Training of Deep Feedforward Networks

Arunselvan Ramaswamy, Shalabh Bhatnagar, Naman Saxena

PDF

Open Access

TL;DR

This paper introduces a new training algorithm combining gradient clipping and standard gradients, along with a novel squashing activation function, to ensure stability and consistency in deep neural network training across supervised, unsupervised, and reinforcement learning tasks.

Contribution

The paper proposes a provably stable training algorithm for deep networks using gradient clipping on the output layer and a new activation function, tGELU, to prevent vanishing gradients and improve stability.

Findings

01

Training stability is improved with low variance in updates.

02

Reinforcement learning benefits include omission of target networks.

03

Classification tasks show reduced variance and smoother loss reduction.

Abstract

We present a novel algorithm for training deep neural networks in supervised (classification and regression) and unsupervised (reinforcement learning) scenarios. This algorithm combines the standard stochastic gradient descent and the gradient clipping method. The output layer is updated using clipped gradients, the rest of the neural network is updated using standard gradients. Updating the output layer using clipped gradient stabilizes it. We show that the remaining layers are automatically stabilized provided the neural network is only composed of squashing (compact range) activations. We also present a novel squashing activation function - it is obtained by modifying a Gaussian Error Linear Unit (GELU) to have compact range - we call it Truncated GELU (tGELU). Unlike other squashing activations, such as sigmoid, the range of tGELU can be explicitly specified. As a consequence, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification · Neural Networks and Applications

MethodsGradient Clipping · Q-Learning