TL;DR
This paper introduces a new second-order training algorithm based on an Extended Kalman filter for LSTM networks, achieving significant accuracy improvements and faster training compared to existing methods in adaptive learning tasks.
Contribution
The paper presents an efficient EKF-based second-order training algorithm for LSTM networks that is fully online and outperforms state-of-the-art adaptive methods in accuracy and speed.
Findings
10-45% accuracy improvement over Adam, RMSprop, and DEKF
10-15 times faster than EKF in training
Consistent performance gains across experiments
Abstract
We study adaptive (or online) nonlinear regression with Long-Short-Term-Memory (LSTM) based networks, i.e., LSTM-based adaptive learning. In this context, we introduce an efficient Extended Kalman filter (EKF) based second-order training algorithm. Our algorithm is truly online, i.e., it does not assume any underlying data generating process and future information, except that the target sequence is bounded. Through an extensive set of experiments, we demonstrate significant performance gains achieved by our algorithm with respect to the state-of-the-art methods. Here, we mainly show that our algorithm consistently provides 10 to 45\% improvement in the accuracy compared to the widely-used adaptive methods Adam, RMSprop, and DEKF, and comparable performance to EKF with a 10 to 15 times reduction in the run-time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAdam · Sigmoid Activation · Tanh Activation · Long Short-Term Memory
