Extracting Moore Machines from Transformers using Queries and Counterexamples

Rik Adriaensen; Jaron Maene

arXiv:2410.06045·cs.LG·September 30, 2025

Extracting Moore Machines from Transformers using Queries and Counterexamples

Rik Adriaensen, Jaron Maene

PDF

Open Access

TL;DR

This paper introduces a method to extract Moore machines from transformer models trained on regular languages, enabling better comparison of their formal language learning capabilities through queries and counterexamples.

Contribution

It presents a novel approach to derive finite state automata from transformers, facilitating analysis of their learned formal languages.

Findings

01

Successfully extracted Moore machines from transformers

02

Analyzed positive-only learning and sequence accuracy

03

Provided a framework for comparing transformer capabilities

Abstract

Fuelled by the popularity of the transformer architecture in deep learning, several works have investigated what formal languages a transformer can learn from data. Nonetheless, existing results remain hard to compare due to methodological differences. To address this, we construct finite state automata as high-level abstractions of transformers trained on regular languages using queries and counterexamples. Concretely, we extract Moore machines, as many training tasks used in literature can be mapped onto them. We demonstrate the usefulness of this approach by studying positive-only learning and the sequence accuracy measure in detail.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Machine Learning and Algorithms

MethodsSoftmax · Attention Is All You Need