Extracting Moore Machines from Transformers using Queries and Counterexamples
Rik Adriaensen, Jaron Maene

TL;DR
This paper introduces a method to extract Moore machines from transformer models trained on regular languages, enabling better comparison of their formal language learning capabilities through queries and counterexamples.
Contribution
It presents a novel approach to derive finite state automata from transformers, facilitating analysis of their learned formal languages.
Findings
Successfully extracted Moore machines from transformers
Analyzed positive-only learning and sequence accuracy
Provided a framework for comparing transformer capabilities
Abstract
Fuelled by the popularity of the transformer architecture in deep learning, several works have investigated what formal languages a transformer can learn from data. Nonetheless, existing results remain hard to compare due to methodological differences. To address this, we construct finite state automata as high-level abstractions of transformers trained on regular languages using queries and counterexamples. Concretely, we extract Moore machines, as many training tasks used in literature can be mapped onto them. We demonstrate the usefulness of this approach by studying positive-only learning and the sequence accuracy measure in detail.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Machine Learning and Algorithms
MethodsSoftmax · Attention Is All You Need
