A Framework for Provably Stable and Consistent Training of Deep Feedforward Networks
Arunselvan Ramaswamy, Shalabh Bhatnagar, Naman Saxena

TL;DR
This paper introduces a new training algorithm combining gradient clipping and standard gradients, along with a novel squashing activation function, to ensure stability and consistency in deep neural network training across supervised, unsupervised, and reinforcement learning tasks.
Contribution
The paper proposes a provably stable training algorithm for deep networks using gradient clipping on the output layer and a new activation function, tGELU, to prevent vanishing gradients and improve stability.
Findings
Training stability is improved with low variance in updates.
Reinforcement learning benefits include omission of target networks.
Classification tasks show reduced variance and smoother loss reduction.
Abstract
We present a novel algorithm for training deep neural networks in supervised (classification and regression) and unsupervised (reinforcement learning) scenarios. This algorithm combines the standard stochastic gradient descent and the gradient clipping method. The output layer is updated using clipped gradients, the rest of the neural network is updated using standard gradients. Updating the output layer using clipped gradient stabilizes it. We show that the remaining layers are automatically stabilized provided the neural network is only composed of squashing (compact range) activations. We also present a novel squashing activation function - it is obtained by modifying a Gaussian Error Linear Unit (GELU) to have compact range - we call it Truncated GELU (tGELU). Unlike other squashing activations, such as sigmoid, the range of tGELU can be explicitly specified. As a consequence, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification · Neural Networks and Applications
MethodsGradient Clipping · Q-Learning
