Automata Extraction from Transformers

Yihao Zhang; Zeming Wei; Meng Sun

arXiv:2406.05564·cs.LG·June 11, 2024

Automata Extraction from Transformers

Yihao Zhang, Zeming Wei, Meng Sun

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel automata extraction method for Transformer models, enabling interpretation of their processing of formal languages and improving transparency in understanding their operational mechanisms.

Contribution

It presents a new algorithm that treats Transformers as black boxes and extracts deterministic finite automata to interpret their language processing capabilities.

Findings

01

Transformers can be interpreted as automata using the proposed method.

02

The approach reveals how Transformers understand formal language structures.

03

The method enhances interpretability of Transformer models in language tasks.

Abstract

In modern machine (ML) learning systems, Transformer-based architectures have achieved milestone success across a broad spectrum of tasks, yet understanding their operational mechanisms remains an open problem. To improve the transparency of ML systems, automata extraction methods, which interpret stateful ML models as automata typically through formal languages, have proven effective for explaining the mechanism of recurrent neural networks (RNNs). However, few works have been applied to this paradigm to Transformer models. In particular, understanding their processing of formal languages and identifying their limitations in this area remains unexplored. In this paper, we propose an automata extraction algorithm specifically designed for Transformer models. Treating the Transformer model as a black-box system, we track the model through the transformation process of their internal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhang-yihao/transfomer2dfa
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicssemigroups and automata theory · Algorithms and Data Compression · Handwritten Text Recognition Techniques

MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer