Bifurcations and loss jumps in RNN training
Lukas Eisenmann, Zahra Monfared, Niclas Alexander G\"oring, Daniel, Durstewitz

TL;DR
This paper investigates bifurcations in ReLU-based RNNs, mathematically links them to loss jumps during training, and introduces an exact algorithm for detecting fixed points and cycles, enhancing understanding of RNN dynamics and training behavior.
Contribution
It provides a mathematical proof connecting bifurcations to loss gradients and introduces a novel, exact heuristic algorithm for detecting fixed points and cycles in ReLU RNNs, improving analysis tools.
Findings
Bifurcations are linked to loss jumps in RNN training.
The new algorithm accurately finds fixed points and cycles in ReLU RNNs.
Generalized teacher forcing avoids certain bifurcations during training.
Abstract
Recurrent neural networks (RNNs) are popular machine learning tools for modeling and forecasting sequential data and for inferring dynamical systems (DS) from observed time series. Concepts from DS theory (DST) have variously been used to further our understanding of both, how trained RNNs solve complex tasks, and the training process itself. Bifurcations are particularly important phenomena in DS, including RNNs, that refer to topological (qualitative) changes in a system's dynamical behavior as one or more of its parameters are varied. Knowing the bifurcation structure of an RNN will thus allow to deduce many of its computational and dynamical properties, like its sensitivity to parameter variations or its behavior during training. In particular, bifurcations may account for sudden loss jumps observed in RNN training that could severely impede the training process. Here we first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications
