Tikhonov Regularization for Long Short-Term Memory Networks

Andrei Turkin

arXiv:1708.02979·cs.LG·August 11, 2017·1 cites

Tikhonov Regularization for Long Short-Term Memory Networks

Andrei Turkin

PDF

Open Access

TL;DR

This paper derives a Tikhonov regularizer tailored for LSTM networks, aiming to improve training stability and performance by controlling weight interactions and data perturbations.

Contribution

It introduces a novel Tikhonov regularizer for LSTM networks that accounts for weight interactions and simplifies regularization with a single data perturbation parameter.

Findings

01

Regularizer improves training stability.

02

It effectively controls weight interactions.

03

Applicable to various recurrent neural networks.

Abstract

It is a well-known fact that adding noise to the input data often improves network performance. While the dropout technique may be a cause of memory loss, when it is applied to recurrent connections, Tikhonov regularization, which can be regarded as the training with additive noise, avoids this issue naturally, though it implies regularizer derivation for different architectures. In case of feedforward neural networks this is straightforward, while for networks with recurrent connections and complicated layers it leads to some difficulties. In this paper, a Tikhonov regularizer is derived for Long-Short Term Memory (LSTM) networks. Although it is independent of time for simplicity, it considers interaction between weights of the LSTM unit, which in theory makes it possible to regularize the unit with complicated dependences by using only one parameter that measures the input data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Image and Signal Denoising Methods

MethodsSigmoid Activation · Tanh Activation · Dropout · Long Short-Term Memory