Comparing Generalization in Learning with Limited Numbers of Exemplars: Transformer vs. RNN in Attractor Dynamics
Rui Fukushima, Jun Tani

TL;DR
This paper compares the generalization-in-learning capacity of Transformer and RNN architectures in attractor dynamics tasks, revealing that RNNs outperform Transformers when data is limited, challenging assumptions about Transformer superiority.
Contribution
It provides a direct comparison of Transformer and RNN in low-data scenarios for attractor dynamics, highlighting limitations of Transformers in such conditions.
Findings
Transformers perform worse than RNNs with limited data.
RNNs show better generalization in attractor dynamics learning.
Results challenge the notion of Transformer superiority in all learning contexts.
Abstract
ChatGPT, a widely-recognized large language model (LLM), has recently gained substantial attention for its performance scaling, attributed to the billions of web-sourced natural language sentences used for training. Its underlying architecture, Transformer, has found applications across diverse fields, including video, audio signals, and robotic movement. %The crucial question this raises concerns the Transformer's generalization-in-learning (GIL) capacity. However, this raises a crucial question about Transformer's generalization in learning (GIL) capacity. Is ChatGPT's success chiefly due to the vast dataset used for training, or is there more to the story? To investigate this, we compared Transformer's GIL capabilities with those of a traditional Recurrent Neural Network (RNN) in tasks involving attractor dynamics learning. For performance evaluation, the Dynamic Time Warping (DTW)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Neural Networks and Applications · Advanced Text Analysis Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Dropout · Softmax · Absolute Position Encodings · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing
