Reinforcement Learning in an Adaptable Chess Environment for Detecting   Human-understandable Concepts

Patrik Hammersborg; Inga Str\"umke

arXiv:2211.05500·cs.LG·November 11, 2022

Reinforcement Learning in an Adaptable Chess Environment for Detecting Human-understandable Concepts

Patrik Hammersborg, Inga Str\"umke

PDF

Open Access 1 Repo

TL;DR

This paper presents a method to interpret the internal concepts learned by self-trained reinforcement learning agents, demonstrated on a lightweight chess environment to enhance transparency and understanding.

Contribution

It introduces a novel probing technique for understanding what concepts reinforcement learning agents internalize during training, applied to a computationally accessible chess environment.

Findings

01

Proposed a method to interpret learned concepts in RL agents

02

Applied the method to a lightweight chess environment

03

Enhanced understanding of agent robustness and decision-making

Abstract

Self-trained autonomous agents developed using machine learning are showing great promise in a variety of control settings, perhaps most remarkably in applications involving autonomous vehicles. The main challenge associated with self-learned agents in the form of deep neural networks, is their black-box nature: it is impossible for humans to interpret deep neural networks. Therefore, humans cannot directly interpret the actions of deep neural network based agents, or foresee their robustness in different scenarios. In this work, we demonstrate a method for probing which concepts self-learning agents internalise in the course of their training. For demonstration, we use a chess playing agent in a fast and light environment developed specifically to be suitable for research groups without access to enormous computational resources or machine learning models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

patrik-ha/explainable-minichess
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification

MethodsSelf-Learning