On the Initialization of Long Short-Term Memory Networks

Mostafa Mehdipour Ghazi; Mads Nielsen; Akshay Pai; Marc Modat; M.; Jorge Cardoso; Sebastien Ourselin; Lauge Sorensen

arXiv:1912.10454·cs.LG·December 24, 2019

On the Initialization of Long Short-Term Memory Networks

Mostafa Mehdipour Ghazi, Mads Nielsen, Akshay Pai, Marc Modat, M., Jorge Cardoso, Sebastien Ourselin, Lauge Sorensen

PDF

TL;DR

This paper introduces a new weight initialization method for LSTM networks that improves training stability and convergence, outperforming existing techniques in various time series and disease modeling tasks.

Contribution

A robust initialization approach based on normalized random weights that maintains variance, enhancing LSTM training stability and performance.

Findings

01

Outperforms state-of-the-art initialization methods

02

Improves training convergence speed

03

Enhances generalization in time series tasks

Abstract

Weight initialization is important for faster convergence and stability of deep neural networks training. In this paper, a robust initialization method is developed to address the training instability in long short-term memory (LSTM) networks. It is based on a normalized random initialization of the network weights that aims at preserving the variance of the network input and output in the same range. The method is applied to standard LSTMs for univariate time series regression and to LSTMs robust to missing values for multivariate disease progression modeling. The results show that in all cases, the proposed initialization method outperforms the state-of-the-art initialization techniques in terms of training convergence and generalization performance of the obtained solution.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.