Automata Extraction from Transformers
Yihao Zhang, Zeming Wei, Meng Sun

TL;DR
This paper introduces a novel automata extraction method for Transformer models, enabling interpretation of their processing of formal languages and improving transparency in understanding their operational mechanisms.
Contribution
It presents a new algorithm that treats Transformers as black boxes and extracts deterministic finite automata to interpret their language processing capabilities.
Findings
Transformers can be interpreted as automata using the proposed method.
The approach reveals how Transformers understand formal language structures.
The method enhances interpretability of Transformer models in language tasks.
Abstract
In modern machine (ML) learning systems, Transformer-based architectures have achieved milestone success across a broad spectrum of tasks, yet understanding their operational mechanisms remains an open problem. To improve the transparency of ML systems, automata extraction methods, which interpret stateful ML models as automata typically through formal languages, have proven effective for explaining the mechanism of recurrent neural networks (RNNs). However, few works have been applied to this paradigm to Transformer models. In particular, understanding their processing of formal languages and identifying their limitations in this area remains unexplored. In this paper, we propose an automata extraction algorithm specifically designed for Transformer models. Treating the Transformer model as a black-box system, we track the model through the transformation process of their internal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicssemigroups and automata theory · Algorithms and Data Compression · Handwritten Text Recognition Techniques
MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer
