P-ADMMiRNN: Training RNN with Stable Convergence via An Efficient and Paralleled ADMM Approach
Yu Tang, Zhigang Kan, Dequan Sun, Jingjing Xiao, Zhiquan Lai, Linbo, Qiao, Dongsheng Li

TL;DR
This paper introduces ADMMiRNN, a novel framework for training RNNs with stable convergence using an efficient, parallelized ADMM approach, effectively addressing gradient issues and improving training stability.
Contribution
The work develops a new ADMM-based framework for RNN training, providing convergence analysis, novel update rules, and a parallelized version for improved efficiency and stability.
Findings
ADMMiRNN achieves stable convergence and outperforms baselines.
It effectively prevents gradient vanishing and exploding.
The parallel version enables asynchronous training of RNNs.
Abstract
It is hard to train Recurrent Neural Network (RNN) with stable convergence and avoid gradient vanishing and exploding problems, as the weights in the recurrent unit are repeated from iteration to iteration. Moreover, RNN is sensitive to the initialization of weights and bias, which brings difficulties in training. The Alternating Direction Method of Multipliers (ADMM) has become a promising algorithm to train neural networks beyond traditional stochastic gradient algorithms with the gradient-free features and immunity to unsatisfactory conditions. However, ADMM could not be applied to train RNN directly since the state in the recurrent unit is repetitively updated over timesteps. Therefore, this work builds a new framework named ADMMiRNN upon the unfolded form of RNN to address the above challenges simultaneously. We also provide novel update rules and theoretical convergence analysis.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Neural Networks and Applications · Advanced Neural Network Applications
MethodsAlternating Direction Method of Multipliers
