Understanding Game-Playing Agents with Natural Language Annotations
Nicholas Tomlin, Andre He, Dan Klein

TL;DR
This paper introduces a dataset of human-annotated Go games and demonstrates how natural language comments can reveal the high-level concepts encoded within game-playing AI models, aiding interpretability.
Contribution
It provides a new dataset of annotated Go games and shows that domain-specific concepts are encoded in the internal representations of AI agents, enhancing interpretability.
Findings
Game concepts are encoded in policy networks.
Later layers encode high-level abstractions.
Annotations help interpret AI decision-making.
Abstract
We present a new dataset containing 10K human-annotated games of Go and show how these natural language annotations can be used as a tool for model interpretability. Given a board state and its associated comment, our approach uses linear probing to predict mentions of domain-specific terms (e.g., ko, atari) from the intermediate state representations of game-playing agents like AlphaGo Zero. We find these game concepts are nontrivially encoded in two distinct policy networks, one trained via imitation learning and another trained via reinforcement learning. Furthermore, mentions of domain-specific terms are most easily predicted from the later layers of both models, suggesting that these policy networks encode high-level abstractions similar to those used in the natural language annotations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Topic Modeling · Natural Language Processing Techniques
