Using Regular Languages to Explore the Representational Capacity of Recurrent Neural Architectures
Abhijit Mahalunkar, John D. Kelleher

TL;DR
This paper investigates how well various recurrent neural network architectures can handle long-distance dependencies in sequences by using synthetic datasets based on subregular languages with controllable complexity.
Contribution
It introduces a benchmarking approach using synthetic SPk datasets to systematically evaluate RNNs' capacity to model increasing levels of long-distance dependencies.
Findings
LDDs significantly impact RNN performance
Synthetic datasets effectively benchmark RNNs' capabilities
SPk languages serve as useful controlled test cases
Abstract
The presence of Long Distance Dependencies (LDDs) in sequential data poses significant challenges for computational models. Various recurrent neural architectures have been designed to mitigate this issue. In order to test these state-of-the-art architectures, there is growing need for rich benchmarking datasets. However, one of the drawbacks of existing datasets is the lack of experimental control with regards to the presence and/or degree of LDDs. This lack of control limits the analysis of model performance in relation to the specific challenge posed by LDDs. One way to address this is to use synthetic data having the properties of subregular languages. The degree of LDDs within the generated data can be controlled through the k parameter, length of the generated strings, and by choosing appropriate forbidden strings. In this paper, we explore the capacity of different RNN extensions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
