Using Part-based Representations for Explainable Deep Reinforcement Learning
Manos Kirtas, Konstantinos Tsampazis, Loukia Avramelou, Nikolaos, Passalis, Anastasios Tefas

TL;DR
This paper introduces a non-negative training method for deep reinforcement learning that facilitates interpretable part-based representations, demonstrated on the Cartpole benchmark.
Contribution
It proposes a novel non-negative initialization and sign-preserving training approach to improve interpretability in RL models with part-based representations.
Findings
Enhanced interpretability of RL models through part-based representations.
Improved training stability and convergence with the proposed method.
Successful application on the Cartpole benchmark.
Abstract
Utilizing deep learning models to learn part-based representations holds significant potential for interpretable-by-design approaches, as these models incorporate latent causes obtained from feature representations through simple addition. However, training a part-based learning model presents challenges, particularly in enforcing non-negative constraints on the model's parameters, which can result in training difficulties such as instability and convergence issues. Moreover, applying such approaches in Deep Reinforcement Learning (RL) is even more demanding due to the inherent instabilities that impact many optimization methods. In this paper, we propose a non-negative training approach for actor models in RL, enabling the extraction of part-based representations that enhance interpretability while adhering to non-negative constraints. To this end, we employ a non-negative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Healthcare
