P-ADMMiRNN: Training RNN with Stable Convergence via An Efficient and   Paralleled ADMM Approach

Yu Tang; Zhigang Kan; Dequan Sun; Jingjing Xiao; Zhiquan Lai; Linbo; Qiao; Dongsheng Li

arXiv:2006.05622·cs.LG·March 29, 2022

P-ADMMiRNN: Training RNN with Stable Convergence via An Efficient and Paralleled ADMM Approach

Yu Tang, Zhigang Kan, Dequan Sun, Jingjing Xiao, Zhiquan Lai, Linbo, Qiao, Dongsheng Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces ADMMiRNN, a novel framework for training RNNs with stable convergence using an efficient, parallelized ADMM approach, effectively addressing gradient issues and improving training stability.

Contribution

The work develops a new ADMM-based framework for RNN training, providing convergence analysis, novel update rules, and a parallelized version for improved efficiency and stability.

Findings

01

ADMMiRNN achieves stable convergence and outperforms baselines.

02

It effectively prevents gradient vanishing and exploding.

03

The parallel version enables asynchronous training of RNNs.

Abstract

It is hard to train Recurrent Neural Network (RNN) with stable convergence and avoid gradient vanishing and exploding problems, as the weights in the recurrent unit are repeated from iteration to iteration. Moreover, RNN is sensitive to the initialization of weights and bias, which brings difficulties in training. The Alternating Direction Method of Multipliers (ADMM) has become a promising algorithm to train neural networks beyond traditional stochastic gradient algorithms with the gradient-free features and immunity to unsatisfactory conditions. However, ADMM could not be applied to train RNN directly since the state in the recurrent unit is repetitively updated over timesteps. Therefore, this work builds a new framework named ADMMiRNN upon the unfolded form of RNN to address the above challenges simultaneously. We also provide novel update rules and theoretical convergence analysis.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TonyTangYu/ADMMiRNN
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Neural Networks and Applications · Advanced Neural Network Applications

MethodsAlternating Direction Method of Multipliers