Adaptive-saturated RNN: Remember more with less instability
Khoi Minh Nguyen-Duy, Quang Pham, Binh T. Nguyen

TL;DR
This paper introduces Adaptive-Saturated RNNs (asRNN), a novel model that dynamically balances between vanilla and orthogonal RNNs to enhance memory capacity while maintaining training stability, demonstrated through competitive benchmark results.
Contribution
The paper proposes a new RNN variant, asRNN, that adaptively adjusts saturation levels to combine the advantages of vanilla and orthogonal RNNs.
Findings
asRNN outperforms several strong competitors on sequence learning benchmarks.
The adaptive saturation mechanism improves both memory capacity and training stability.
Code implementation is publicly available for reproducibility.
Abstract
Orthogonal parameterization is a compelling solution to the vanishing gradient problem (VGP) in recurrent neural networks (RNNs). With orthogonal parameters and non-saturated activation functions, gradients in such models are constrained to unit norms. On the other hand, although the traditional vanilla RNNs are seen to have higher memory capacity, they suffer from the VGP and perform badly in many applications. This work proposes Adaptive-Saturated RNNs (asRNN), a variant that dynamically adjusts its saturation level between the two mentioned approaches. Consequently, asRNN enjoys both the capacity of a vanilla RNN and the training stability of orthogonal RNNs. Our experiments show encouraging results of asRNN on challenging sequence learning benchmarks compared to several strong competitors. The research code is accessible at https://github.com/ndminhkhoi46/asRNN/.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Human Pose and Action Recognition · Advanced Neural Network Applications
