Interpretable Failure Analysis in Multi-Agent Reinforcement Learning Systems

Risal Shahriar Shefin; Debashis Gupta; Thai Le; Sarra Alqahtani

arXiv:2602.08104·cs.AI·February 24, 2026

Interpretable Failure Analysis in Multi-Agent Reinforcement Learning Systems

Risal Shahriar Shefin, Debashis Gupta, Thai Le, Sarra Alqahtani

PDF

Open Access 1 Datasets

TL;DR

This paper presents a gradient-based framework for interpretable failure detection and analysis in multi-agent reinforcement learning systems, enabling diagnosis of failure sources and propagation pathways.

Contribution

It introduces a novel two-stage gradient analysis method that provides interpretable diagnostics for failure detection and propagation in MARL systems.

Findings

01

Achieves 88.2-99.4% accuracy in Patient-0 detection

02

Provides geometric evidence for failure propagation pathways

03

Effective across multiple MARL environments

Abstract

Multi-Agent Reinforcement Learning (MARL) is increasingly deployed in safety-critical domains, yet methods for interpretable failure detection and attribution remain underdeveloped. We introduce a two-stage gradient-based framework that provides interpretable diagnostics for three critical failure analysis tasks: (1) detecting the true initial failure source (Patient-0); (2) validating why non-attacked agents may be flagged first due to domino effects; and (3) tracing how failures propagate through learned coordination pathways. Stage 1 performs interpretable per-agent failure detection via Taylor-remainder analysis of policy-gradient costs, declaring an initial Patient-0 candidate at the first threshold crossing. Stage 2 provides validation through geometric analysis of critic derivatives-first-order sensitivity and directional second-order curvature aggregated over causal windows to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Kylan12/Synthetic-AI-ML-Dataset
dataset· 42 dl
42 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)