Adaptive Noise Injection: A Structure-Expanding Regularization for RNN
Rui Li, Kai Shuang, Mengyu Gu, Sen Su

TL;DR
This paper introduces Adjective Noise Injection (ANI), a novel adaptive regularization method for RNNs, which uses an extra RNN branch to generate noise that enhances training stability and performance, especially in early stages.
Contribution
ANI is a new structure-expanding regularization technique that adaptively injects noise from an auxiliary RNN to improve RNN training and expressiveness.
Findings
ANI regularizes RNNs effectively during early training stages.
ANI improves training performance on PTB, WT2, and WT103 datasets.
Robustness against parameter update errors is enhanced with ANI.
Abstract
The vanilla LSTM has become one of the most potential architectures in word-level language modeling, like other recurrent neural networks, overfitting is always a key barrier for its effectiveness. The existing noise-injected regularizations introduce the random noises of fixation intensity, which inhibits the learning of the RNN throughout the training process. In this paper, we propose a new structure-expanding regularization method called Adjective Noise Injection (ANI), which considers the output of an extra RNN branch as a kind of adaptive noises and injects it into the main-branch RNN output. Due to the adaptive noises can be improved as the training processes, its negative effects can be weakened and even transformed into a positive effect to further improve the expressiveness of the main-branch RNN. As a result, ANI can regularize the RNN in the early stage of training and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
