Comparing Generalization in Learning with Limited Numbers of Exemplars:   Transformer vs. RNN in Attractor Dynamics

Rui Fukushima; Jun Tani

arXiv:2311.10763·cs.CL·November 21, 2023·1 cites

Comparing Generalization in Learning with Limited Numbers of Exemplars: Transformer vs. RNN in Attractor Dynamics

Rui Fukushima, Jun Tani

PDF

Open Access

TL;DR

This paper compares the generalization-in-learning capacity of Transformer and RNN architectures in attractor dynamics tasks, revealing that RNNs outperform Transformers when data is limited, challenging assumptions about Transformer superiority.

Contribution

It provides a direct comparison of Transformer and RNN in low-data scenarios for attractor dynamics, highlighting limitations of Transformers in such conditions.

Findings

01

Transformers perform worse than RNNs with limited data.

02

RNNs show better generalization in attractor dynamics learning.

03

Results challenge the notion of Transformer superiority in all learning contexts.

Abstract

ChatGPT, a widely-recognized large language model (LLM), has recently gained substantial attention for its performance scaling, attributed to the billions of web-sourced natural language sentences used for training. Its underlying architecture, Transformer, has found applications across diverse fields, including video, audio signals, and robotic movement. %The crucial question this raises concerns the Transformer's generalization-in-learning (GIL) capacity. However, this raises a crucial question about Transformer's generalization in learning (GIL) capacity. Is ChatGPT's success chiefly due to the vast dataset used for training, or is there more to the story? To investigate this, we compared Transformer's GIL capabilities with those of a traditional Recurrent Neural Network (RNN) in tasks involving attractor dynamics learning. For performance evaluation, the Dynamic Time Warping (DTW)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting · Neural Networks and Applications · Advanced Text Analysis Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Dropout · Softmax · Absolute Position Encodings · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing