Deconstructing Recurrence, Attention, and Gating: Investigating the transferability of Transformers and Gated Recurrent Neural Networks in forecasting of dynamical systems
Hunter S. Heidenreich, Pantelis R. Vlachas, Petros Koumoutsakos

TL;DR
This study dissects core components of transformers and RNNs, demonstrating how gating, recurrence, and attention mechanisms influence forecasting accuracy across various dynamical systems and benchmarks.
Contribution
It introduces a systematic ablation analysis of neural architecture components, revealing their transferability and effectiveness in diverse forecasting tasks, and proposes a novel hybrid architecture.
Findings
Gating and attention mechanisms enhance RNN performance.
Recurrence in transformers can be detrimental.
Hybrid Recurrent Highway Networks outperform standard models.
Abstract
Machine learning architectures, including transformers and recurrent neural networks (RNNs) have revolutionized forecasting in applications ranging from text processing to extreme weather. Notably, advanced network architectures, tuned for applications such as natural language processing, are transferable to other tasks such as spatiotemporal forecasting tasks. However, there is a scarcity of ablation studies to illustrate the key components that enable this forecasting accuracy. The absence of such studies, although explainable due to the associated computational cost, intensifies the belief that these models ought to be considered as black boxes. In this work, we decompose the key architectural components of the most powerful neural architectures, namely gating and recurrence in RNNs, and attention mechanisms in transformers. Then, we synthesize and build novel hybrid architectures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Complex Systems and Time Series Analysis
MethodsSoftmax · Attention Is All You Need · Highway networks
