TL;DR
Chessformer is a unified transformer-based architecture for chess that improves move prediction, enhances playing strength, and offers interpretability by aligning model design with chess domain geometry.
Contribution
Introduces Chessformer, a single architecture that advances chess modeling by integrating domain-specific geometry into tokenization, positional encoding, and action prediction.
Findings
Achieves 57.1% move-matching accuracy, surpassing previous models.
Adds over 100 Elo to Leela Chess Zero, winning major competitions.
Enables granular interpretability of attention patterns on the chessboard.
Abstract
Chess has long served as a canonical testbed for artificial intelligence, but modeling approaches for its central tasks have diverged. Maximizing playing strength, predicting human play, and enabling interpretability are typically solved with disparate architectures, and these designs are often misaligned with the geometry of the domain. This raises the natural question of whether these objectives require separate modeling paradigms, or if there exists a single architecture that supports them simultaneously. We introduce Chessformer, a unified architecture that advances the state of the art on all three central goals in chess modeling. Chessformer is an encoder-only transformer that represents board squares as tokens, augments self-attention with a novel dynamic positional encoding called Geometric Attention Bias (GAB) that adapts to domain-specific geometry, and predicts actions with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
