Reinforcement Learning in an Adaptable Chess Environment for Detecting Human-understandable Concepts
Patrik Hammersborg, Inga Str\"umke

TL;DR
This paper presents a method to interpret the internal concepts learned by self-trained reinforcement learning agents, demonstrated on a lightweight chess environment to enhance transparency and understanding.
Contribution
It introduces a novel probing technique for understanding what concepts reinforcement learning agents internalize during training, applied to a computationally accessible chess environment.
Findings
Proposed a method to interpret learned concepts in RL agents
Applied the method to a lightweight chess environment
Enhanced understanding of agent robustness and decision-making
Abstract
Self-trained autonomous agents developed using machine learning are showing great promise in a variety of control settings, perhaps most remarkably in applications involving autonomous vehicles. The main challenge associated with self-learned agents in the form of deep neural networks, is their black-box nature: it is impossible for humans to interpret deep neural networks. Therefore, humans cannot directly interpret the actions of deep neural network based agents, or foresee their robustness in different scenarios. In this work, we demonstrate a method for probing which concepts self-learning agents internalise in the course of their training. For demonstration, we use a chess playing agent in a fast and light environment developed specifically to be suitable for research groups without access to enormous computational resources or machine learning models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification
MethodsSelf-Learning
