Understanding Game-Playing Agents with Natural Language Annotations

Nicholas Tomlin; Andre He; Dan Klein

arXiv:2204.07531·cs.CL·April 18, 2022

Understanding Game-Playing Agents with Natural Language Annotations

Nicholas Tomlin, Andre He, Dan Klein

PDF

Open Access 1 Repo

TL;DR

This paper introduces a dataset of human-annotated Go games and demonstrates how natural language comments can reveal the high-level concepts encoded within game-playing AI models, aiding interpretability.

Contribution

It provides a new dataset of annotated Go games and shows that domain-specific concepts are encoded in the internal representations of AI agents, enhancing interpretability.

Findings

01

Game concepts are encoded in policy networks.

02

Later layers encode high-level abstractions.

03

Annotations help interpret AI decision-making.

Abstract

We present a new dataset containing 10K human-annotated games of Go and show how these natural language annotations can be used as a tool for model interpretability. Given a board state and its associated comment, our approach uses linear probing to predict mentions of domain-specific terms (e.g., ko, atari) from the intermediate state representations of game-playing agents like AlphaGo Zero. We find these game concepts are nontrivially encoded in two distinct policy networks, one trained via imitation learning and another trained via reinforcement learning. Furthermore, mentions of domain-specific terms are most easily predicted from the later layers of both models, suggesting that these policy networks encode high-level abstractions similar to those used in the natural language annotations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

andrehe02/go-probe
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Topic Modeling · Natural Language Processing Techniques