Representing Rule-based Chatbots with Transformers
Dan Friedman, Abhishek Panigrahi, Danqi Chen

TL;DR
This paper explores how Transformer models can emulate rule-based chatbots like ELIZA, providing a formal framework and empirical insights into their internal mechanisms for natural conversation.
Contribution
It introduces a formal construction of Transformers implementing ELIZA and analyzes their learned mechanisms, bridging neural models with symbolic interpretability in dialogue systems.
Findings
Transformers favor induction head mechanisms over position-based copying.
Models use intermediate generations as implicit scratchpads or Chain-of-Thought.
Empirical analysis reveals preferred mechanisms in trained conversational Transformers.
Abstract
What kind of internal mechanisms might Transformers use to conduct fluid, natural-sounding conversations? Prior work has illustrated by construction how Transformers can solve various synthetic tasks, such as sorting a list or recognizing formal languages, but it remains unclear how to extend this approach to a conversational setting. In this work, we propose using ELIZA, a classic rule-based chatbot, as a setting for formal, mechanistic analysis of Transformer-based chatbots. ELIZA allows us to formally model key aspects of conversation, including local pattern matching and long-term dialogue state tracking. We first present a theoretical construction of a Transformer that implements the ELIZA chatbot. Building on prior constructions, particularly those for simulating finite-state automata, we show how simpler mechanisms can be composed and extended to produce more sophisticated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMulti-Agent Systems and Negotiation
MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Linear Layer · Label Smoothing · Adam · Dropout · Multi-Head Attention · Dense Connections · Softmax
