Contrastive Sparse Autoencoders for Interpreting Planning of   Chess-Playing Agents

Yoann Poupart

arXiv:2406.04028·cs.AI·June 12, 2024

Contrastive Sparse Autoencoders for Interpreting Planning of Chess-Playing Agents

Yoann Poupart

PDF

Open Access 1 Repo

TL;DR

This paper introduces contrastive sparse autoencoders (CSAE), a novel interpretability framework for analyzing multi-step planning in chess AI systems, enabling extraction of meaningful concepts from game trajectories.

Contribution

The paper presents CSAE, a new method for interpreting complex planning processes in chess AI, addressing limitations of previous single-state interpretability techniques.

Findings

01

CSAE successfully extracts human-understandable planning concepts.

02

Qualitative analysis reveals meaningful features related to chess strategies.

03

Sanity checks validate the robustness of the interpretability results.

Abstract

AI led chess systems to a superhuman level, yet these systems heavily rely on black-box algorithms. This is unsustainable in ensuring transparency to the end-user, particularly when these systems are responsible for sensitive decision-making. Recent interpretability work has shown that the inner representations of Deep Neural Networks (DNNs) were fathomable and contained human-understandable concepts. Yet, these methods are seldom contextualised and are often based on a single hidden state, which makes them unable to interpret multi-step reasoning, e.g. planning. In this respect, we propose contrastive sparse autoencoders (CSAE), a novel framework for studying pairs of game trajectories. Using CSAE, we are able to extract and interpret concepts that are meaningful to the chess-agent plans. We primarily focused on a qualitative analysis of the CSAE features before proposing an automated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Xmaster6y/lczero-planning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSports Analytics and Performance · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications