Attribution-based Explanations for Markov Decision Processes
Paul Kobialka, Andrea Pferscher, Francesco Leofante, Erika \'Abrah\'am, Silvia Lizeth Tapia Tarifa, Einar Broch Johnsen

TL;DR
This paper introduces attribution-based explanation techniques for Markov Decision Processes, enabling importance scoring of states and paths to interpret sequential decision-making agents.
Contribution
It formalizes attribution explanations for MDPs and develops efficient methods to compute importance scores using strategy synthesis techniques.
Findings
Effective importance scores for states and paths in MDPs
Application to five case studies demonstrating interpretability
Bridging static attribution methods to sequential decision contexts
Abstract
Attribution techniques explain the outcome of an AI model by assigning a numerical score to its inputs. So far, these techniques have mainly focused on attributing importance to static input features at a single point in time, and thus fail to generalize to sequential decision-making settings. This paper fills this gap by introducing techniques to generate attribution-based explanations for Markov Decision Processes (MDPs). We give a formal characterization of what attributions should represent in MDPs, focusing on explanations that assign importance scores to both individual states and execution paths. We show how importance scores can be computed by leveraging techniques for strategy synthesis, enabling the efficient computation of these scores despite the non-determinism inherent in an MDP. We evaluate our approach on five case-studies, demonstrating its utility in providing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
