Perspectives for Direct Interpretability in Multi-Agent Deep   Reinforcement Learning

Yoann Poupart; Aur\'elie Beynier; Nicolas Maudet

arXiv:2502.00726·cs.AI·February 4, 2025

Perspectives for Direct Interpretability in Multi-Agent Deep Reinforcement Learning

Yoann Poupart, Aur\'elie Beynier, Nicolas Maudet

PDF

Open Access

TL;DR

This paper explores methods for directly interpreting multi-agent deep reinforcement learning models post-training, providing insights into agent behavior and emergent phenomena without modifying the models.

Contribution

It advocates for a scalable, model-agnostic approach to interpretability in MADRL using post hoc explanation techniques and discusses future research directions.

Findings

01

Relevance backpropagation and other methods can explain agent decisions.

02

Post hoc explanations reveal emergent behaviors and biases.

03

Strategies improve understanding of multi-agent coordination.

Abstract

Multi-Agent Deep Reinforcement Learning (MADRL) was proven efficient in solving complex problems in robotics or games, yet most of the trained models are hard to interpret. While learning intrinsically interpretable models remains a prominent approach, its scalability and flexibility are limited in handling complex tasks or multi-agent dynamics. This paper advocates for direct interpretability, generating post hoc explanations directly from trained models, as a versatile and scalable alternative, offering insights into agents' behaviour, emergent phenomena, and biases without altering models' architectures. We explore modern methods, including relevance backpropagation, knowledge edition, model steering, activation patching, sparse autoencoders and circuit discovery, to highlight their applicability to single-agent, multi-agent, and training process challenges. By addressing MADRL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification