Training Characteristic Functions with Reinforcement Learning: XAI-methods play Connect Four
Stephan W\"aldchen, Felix Huber, Sebastian Pokutta

TL;DR
This paper introduces a method to train neural network-based characteristic functions for game-playing, specifically Connect Four, to improve explainability and evaluation of saliency attribution methods in AI.
Contribution
It presents a novel approach to train characteristic functions directly via reinforcement learning for game scenarios, enabling fairer and more practical XAI evaluations.
Findings
Neural characteristic functions can effectively play Connect Four.
Training with partial information improves XAI method comparison.
The approach reduces off-manifold evaluation issues.
Abstract
One of the goals of Explainable AI (XAI) is to determine which input components were relevant for a classifier decision. This is commonly know as saliency attribution. Characteristic functions (from cooperative game theory) are able to evaluate partial inputs and form the basis for theoretically "fair" attribution methods like Shapley values. Given only a standard classifier function, it is unclear how partial input should be realised. Instead, most XAI-methods for black-box classifiers like neural networks consider counterfactual inputs that generally lie off-manifold. This makes them hard to evaluate and easy to manipulate. We propose a setup to directly train characteristic functions in the form of neural networks to play simple two-player games. We apply this to the game of Connect Four by randomly hiding colour information from our agents during training. This has three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
