Extracting Finite Automata from RNNs Using State Merging
William Merrill, Nikolaos Tsilivis

TL;DR
This paper introduces a novel method inspired by grammatical inference to extract finite automata from RNNs, enhancing interpretability by analyzing internal state compression and performance on benchmark languages.
Contribution
The paper presents a new state merging-based approach for automata extraction from RNNs, demonstrating its effectiveness on the Tomita languages benchmark and revealing insights about training dynamics.
Findings
Extraction performance improves with more data.
Training beyond convergence leads to internal state compression.
Extracted automata faithfully represent RNN behavior.
Abstract
One way to interpret the behavior of a blackbox recurrent neural network (RNN) is to extract from it a more interpretable discrete computational model, like a finite state machine, that captures its behavior. In this work, we propose a new method for extracting finite automata from RNNs inspired by the state merging paradigm from grammatical inference. We demonstrate the effectiveness of our method on the Tomita languages benchmark, where we find that it is able to extract faithful automata from RNNs trained on all languages in the benchmark. We find that extraction performance is aided by the number of data provided during the extraction process, as well as, curiously, whether the RNN model is trained for additional epochs after perfectly learning its target language. We use our method to analyze this phenomenon, finding that training beyond convergence is useful because it leads to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Ferroelectric and Negative Capacitance Devices · Topic Modeling
