Optimizing and Contrasting Recurrent Neural Network Architectures
Ben Krause

TL;DR
This paper investigates optimization techniques and architectures for RNNs, demonstrating that Hessian free optimization and a new multiplicative LSTM hybrid improve performance on character prediction tasks.
Contribution
It introduces a novel multiplicative LSTM hybrid architecture and evaluates the effectiveness of Hessian free optimization for training various RNN models.
Findings
Multiplicative LSTM outperforms standard LSTM and multiplicative RNNs.
Hessian free optimization effectively trains complex RNN architectures.
The new hybrid model achieves competitive results with state-of-the-art RNNs.
Abstract
Recurrent Neural Networks (RNNs) have long been recognized for their potential to model complex time series. However, it remains to be determined what optimization techniques and recurrent architectures can be used to best realize this potential. The experiments presented take a deep look into Hessian free optimization, a powerful second order optimization method that has shown promising results, but still does not enjoy widespread use. This algorithm was used to train to a number of RNN architectures including standard RNNs, long short-term memory, multiplicative RNNs, and stacked RNNs on the task of character prediction. The insights from these experiments led to the creation of a new multiplicative LSTM hybrid architecture that outperformed both LSTM and multiplicative RNNs. When tested on a larger scale, multiplicative LSTM achieved character level modelling results competitive with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Time Series Analysis and Forecasting · Stock Market Forecasting Methods
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
