Training Neural Networks as Recognizers of Formal Languages
Alexandra Butoi, Ghazal Khalighinejad, Anej Svete, Josef, Valvoda, Ryan Cotterell, Brian DuSell

TL;DR
This paper investigates the formal language recognition capabilities of neural networks by training them directly as classifiers on various languages, providing a new benchmark and insights into their reasoning power across the Chomsky hierarchy.
Contribution
It introduces a method for training neural networks as recognizers of formal languages and provides a benchmark dataset, FLaRe, for empirical evaluation of language recognition capabilities.
Findings
RNNs and LSTMs often outperform transformers in language recognition tasks.
Auxiliary objectives like language modeling can improve recognition performance.
Performance varies across architectures and languages, with no single approach being universally best.
Abstract
Characterizing the computational power of neural network architectures in terms of formal language theory remains a crucial line of research, as it describes lower and upper bounds on the reasoning capabilities of modern AI. However, when empirically testing these bounds, existing work often leaves a discrepancy between experiments and the formal claims they are meant to support. The problem is that formal language theory pertains specifically to recognizers: machines that receive a string as input and classify whether it belongs to a language. On the other hand, it is common instead to evaluate language models on proxy tasks, e.g., language modeling or sequence-to-sequence transduction, that are similar in only an informal sense to the underlying theory. We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings, using a general method…
Peer Reviews
Decision·ICLR 2025 Poster
The strength of the paper lies in its motivation to quantify gaps between different architectures by formal language recognition tasks. The authors start with a discussion on the basis of formal language theory, and how existing works differ from this basis in their experimental settings. Then, they propose a novel data generation algorithm that can sample positive and hard negative samples from each formal language. The algorithm has been discussed in great detail, and the running time has been
As such, the work doesn't have many weaknesses. I have a couple of questions regarding the setup. a) **Behavior with input length** - The authors report aggregated scores over strings of all lengths. It would be interesting to include a discussion on input length v/s performance for different models. Are there tasks where the transformer's performance shows a more exponential/drastic drop with input length? b) **Tasks where additional loss terms hurt:** - Are there tasks where the additional
This is a well-performed study of the ability of neural networks to recognize formal languages. The writing is of good quality, the methodology is clearly communicated, and it introduces an efficient algorithm for sampling strings from regular languages. Models are trained with validation sets of length [0, 40] (short) or [0, 80] (long) and tested on a set with strings of length [0, 500]. The authors of this study introduce several additional languages -- Dyck-(2, 3), binary strings that start w
* The general method for training neural networks as language recognizers is straightforward and is difficult to see as a substantial contribution (see similarities with e.g. [2]). * The algorithm for efficiently sampling from a regular language is described, and the authors state that they use it for sampling training and evaluation instances from the regular languages (class `R` in Table 1). Since the algorithm is only used to generate data for 25% of the types of languages in this study, the
1. It is an important direction to study neural networks' computational ability through formal language theory. 2. This paper introduces an efficient algorithm for length-controlled sampling from finite automata, which may have practical value for future research in formal language processing. 3. Experiments are conducted on a variety of formal languages with 3 neural models with different architectures (RNN, LSTM, transformer). The methodology appears to be well-documented and reproducible.
1. The paper overlooks significant existing work on training neural networks for formal language recognition tasks (e.g. [1, 2, 3]). This oversight diminishes the claimed novelty of the proposed experimental setup. The authors should acknowledge and position their work in relation to previous studies. 2. The claimed technical improvement in sampling from finite automata appears to be the main novel contribution, but its significance needs better contextualization. References [1] Bhattami
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Neural Networks and Applications
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
