On Evaluating the Generalization of LSTM Models in Formal Languages

Mirac Suzgun; Yonatan Belinkov; Stuart M. Shieber

arXiv:1811.01001·cs.CL·November 5, 2018·6 cites

On Evaluating the Generalization of LSTM Models in Formal Languages

Mirac Suzgun, Yonatan Belinkov, Stuart M. Shieber

PDF

Open Access 1 Repo

TL;DR

This paper empirically investigates how well LSTM models can learn and generalize formal languages, revealing significant differences based on training conditions and emphasizing careful evaluation of neural network capabilities.

Contribution

It provides an empirical analysis of LSTM generalization on formal languages, highlighting the impact of training regimes and model capacity on learning outcomes.

Findings

01

Performance varies significantly with training settings

02

Careful assessment is essential for claims about neural network capabilities

03

Different training data regimes influence generalization to unobserved samples

Abstract

Recurrent Neural Networks (RNNs) are theoretically Turing-complete and established themselves as a dominant model for language processing. Yet, there still remains an uncertainty regarding their language learning capabilities. In this paper, we empirically evaluate the inductive learning capabilities of Long Short-Term Memory networks, a popular extension of simple RNNs, to learn simple formal languages, in particular $a^{n} b^{n}$ , $a^{n} b^{n} c^{n}$ , and $a^{n} b^{n} c^{n} d^{n}$ . We investigate the influence of various aspects of learning, such as training data regimes and model capacity, on the generalization to unobserved samples. We find striking differences in model performances under different training settings and highlight the need for careful analysis and assessment when making claims about the learning capabilities of neural network models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

suzgunmirac/lstm-eval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Algorithms