Sampling-based Gradient Regularization for Capturing Long-Term Dependencies in Recurrent Neural Networks
Artem Chernodub, Dimitri Nowicki

TL;DR
This paper introduces a universal sampling-based gradient regularization technique for RNNs that maintains gradient norms within suitable ranges, enhancing the network's ability to learn long-term dependencies in sequences.
Contribution
The authors propose a novel method to estimate and control the contribution of each training example to the gradient norm, improving RNN training for long-term dependencies.
Findings
RNNs can detect event links over approximately 100 time steps.
The method maintains stable gradient norms during training.
Experimental results show improved long-term dependency learning.
Abstract
Vanishing (and exploding) gradients effect is a common problem for recurrent neural networks with nonlinear activation functions which use backpropagation method for calculation of derivatives. Deep feedforward neural networks with many hidden layers also suffer from this effect. In this paper we propose a novel universal technique that makes the norm of the gradient stay in the suitable range. We construct a way to estimate a contribution of each training example to the norm of the long-term components of the target function s gradient. Using this subroutine we can construct mini-batches for the stochastic gradient descent (SGD) training that leads to high performance and accuracy of the trained network even for very complex tasks. We provide a straightforward mathematical estimation of minibatch s impact on for the gradient norm and prove its correctness theoretically. To check our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning
