TL;DR
This paper investigates the ability of recurrent neural networks to recognize hierarchical structures in languages, revealing they generalize well within bounded depths but struggle with longer sequences, which explains their effectiveness in natural language processing.
Contribution
The study demonstrates that recurrent models can recognize bounded-depth Dyck languages and generalize to longer sequences within those bounds, clarifying their success in hierarchical language modeling.
Findings
Recurrent models perform well on same-range training and test lengths.
They struggle with longer test sequences unless bounded depth is considered.
Transformers show different generalization behaviors.
Abstract
While recurrent models have been effective in NLP tasks, their performance on context-free languages (CFLs) has been found to be quite weak. Given that CFLs are believed to capture important phenomena such as hierarchical structure in natural languages, this discrepancy in performance calls for an explanation. We study the performance of recurrent models on Dyck-n languages, a particularly important and well-studied class of CFLs. We find that while recurrent models generalize nearly perfectly if the lengths of the training and test strings are from the same range, they perform poorly if the test strings are longer. At the same time, we observe that recurrent models are expressive enough to recognize Dyck words of arbitrary lengths in finite precision if their depths are bounded. Hence, we evaluate our models on samples generated from Dyck languages with bounded depth and find that they…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
