Fast Saturating Gate for Learning Long Time Scales with Recurrent Neural   Networks

Kentaro Ohno; Sekitoshi Kanai; Yasutoshi Ida

arXiv:2210.01348·cs.LG·October 5, 2022

Fast Saturating Gate for Learning Long Time Scales with Recurrent Neural Networks

Kentaro Ohno, Sekitoshi Kanai, Yasutoshi Ida

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel fast gate function for recurrent neural networks that accelerates convergence, mitigates gradient vanishing, and improves learning of extremely long time scales in time series data.

Contribution

The paper proposes a new fast gate function with doubly exponential convergence, enhancing training efficiency and accuracy for long time scale modeling in RNNs.

Findings

01

Outperforms previous methods in accuracy on long time scale tasks.

02

Achieves better computational efficiency.

03

Effectively mitigates gradient vanishing in gate functions.

Abstract

Gate functions in recurrent models, such as an LSTM and GRU, play a central role in learning various time scales in modeling time series data by using a bounded activation function. However, it is difficult to train gates to capture extremely long time scales due to gradient vanishing of the bounded function for large inputs, which is known as the saturation problem. We closely analyze the relation between saturation of the gate function and efficiency of the training. We prove that the gradient vanishing of the gate function can be mitigated by accelerating the convergence of the saturating function, i.e., making the output of the function converge to 0 or 1 faster. Based on the analysis results, we propose a gate function called fast gate that has a doubly exponential convergence rate with respect to inputs by simple function composition. We empirically show that our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Fast Saturating Gate for Learning Long Time Scales with Recurrent Neural Networks· underline

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Gaussian Processes and Bayesian Inference

MethodsGated Recurrent Unit · Tanh Activation · Sigmoid Activation · Long Short-Term Memory