The Chess Transformer: Mastering Play using Generative Language Models
David Noever, Matt Ciolino, Josh Kalin

TL;DR
This paper introduces a transformer-based model trained on millions of chess games that can generate strategic moves, recognize openings, and support human interaction in chess, bridging language modeling and game strategy.
Contribution
The work demonstrates that language transformers can be adapted to learn and generate complex chess strategies from large game datasets, extending their application beyond natural language.
Findings
Transformer generates plausible chess strategies.
Model recognizes classic chess openings.
Supports human-robot chess interaction.
Abstract
This work demonstrates that natural language transformers can support more generic strategic modeling, particularly for text-archived games. In addition to learning natural language skills, the abstract transformer architecture can generate meaningful moves on a chessboard. With further fine-tuning, the transformer learns complex gameplay by training on 2.8 million chess games in Portable Game Notation. After 30,000 training steps, OpenAI's Generative Pre-trained Transformer (GPT-2) optimizes weights for 774 million parameters. This fine-tuned Chess Transformer generates plausible strategies and displays game formations identifiable as classic openings, such as English or the Slav Exchange. Finally, in live play, the novel model demonstrates a human-to-transformer interface that correctly filters illegal moves and provides a novel method to challenge the transformer's chess strategies.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Digital Games and Media · Sports Analytics and Performance
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Attention Is All You Need · Multi-Head Attention · Byte Pair Encoding · Label Smoothing · Dropout · Adam
