Recurrent Neural Networks in the Eye of Differential Equations
Murphy Yuezhen Niu, Lior Horesh, Isaac Chuang

TL;DR
This paper analyzes recurrent neural networks through the lens of ordinary differential equations, establishing a theoretical framework that links RNN architecture properties to ODE integration methods, leading to new design insights and a quantum-inspired neural network.
Contribution
It introduces the ODERNN framework connecting RNNs with ODE integration methods, enabling systematic analysis and novel architecture design such as the QUNN.
Findings
Popular RNNs like LSTM fit into specific ODE order classes.
The framework provides conditions for RNN training stability.
QUNN reduces training parameters to linear in memory length.
Abstract
To understand the fundamental trade-offs between training stability, temporal dynamics and architectural complexity of recurrent neural networks~(RNNs), we directly analyze RNN architectures using numerical methods of ordinary differential equations~(ODEs). We define a general family of RNNs--the ODERNNs--by relating the composition rules of RNNs to integration methods of ODEs at discrete time steps. We show that the degree of RNN's functional nonlinearity and the range of its temporal memory can be mapped to the corresponding stage of Runge-Kutta recursion and the order of time-derivative of the ODEs. We prove that popular RNN architectures, such as LSTM and URNN, fit into different orders of --ODERNNs. This exact correspondence between RNN and ODE helps us to establish the sufficient conditions for RNN training stability and facilitates more flexible top-down designs of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Neural Networks and Reservoir Computing
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
