Mixture of Masters: Sparse Chess Language Models with Player Routing
Giacomo Frisoni, Lorenzo Molfetta, Davide Freddi, Gianluca Moro

TL;DR
The paper introduces Mixture-of-Masters, a chess language model with expert personas representing grandmasters, which dynamically switches styles to improve performance and interpretability over traditional dense models.
Contribution
It presents the first chess mixture-of-experts model with expert personas, enabling style switching and outperforming dense models and GPT baselines.
Findings
MoM outperforms dense networks and GPT baselines on unseen games.
MoM maintains generation variety, control, and interpretability.
Expert personas emulate different grandmasters' styles effectively.
Abstract
Modern chess language models are dense transformers trained on millions of games played by thousands of high-rated individuals. However, these monolithic networks tend to collapse into mode-averaged behavior, where stylistic boundaries are blurred, and rare but effective strategies are suppressed. To counteract homogenization, we introduce Mixture-of-Masters (MoM), the first chess mixture-of-experts model with small-sized GPT experts emulating world-class grandmasters. For each move, a post-hoc learnable gating network selects the most appropriate persona to channel depending on the game state, allowing MoM to switch its style dynamically, e.g., Tal's offensive vocation or Petrosian's defensive solidity. When evaluated against Stockfish on unseen standard games, MoM outperforms both dense individual expert networks and popular GPT baselines trained on aggregated data, while ensuring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
