Optimizing and Contrasting Recurrent Neural Network Architectures

Ben Krause

arXiv:1510.04953·stat.ML·October 19, 2015·1 cites

Optimizing and Contrasting Recurrent Neural Network Architectures

Ben Krause

PDF

Open Access

TL;DR

This paper investigates optimization techniques and architectures for RNNs, demonstrating that Hessian free optimization and a new multiplicative LSTM hybrid improve performance on character prediction tasks.

Contribution

It introduces a novel multiplicative LSTM hybrid architecture and evaluates the effectiveness of Hessian free optimization for training various RNN models.

Findings

01

Multiplicative LSTM outperforms standard LSTM and multiplicative RNNs.

02

Hessian free optimization effectively trains complex RNN architectures.

03

The new hybrid model achieves competitive results with state-of-the-art RNNs.

Abstract

Recurrent Neural Networks (RNNs) have long been recognized for their potential to model complex time series. However, it remains to be determined what optimization techniques and recurrent architectures can be used to best realize this potential. The experiments presented take a deep look into Hessian free optimization, a powerful second order optimization method that has shown promising results, but still does not enjoy widespread use. This algorithm was used to train to a number of RNN architectures including standard RNNs, long short-term memory, multiplicative RNNs, and stacked RNNs on the task of character prediction. The insights from these experiments led to the creation of a new multiplicative LSTM hybrid architecture that outperformed both LSTM and multiplicative RNNs. When tested on a larger scale, multiplicative LSTM achieved character level modelling results competitive with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Time Series Analysis and Forecasting · Stock Market Forecasting Methods

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory