MTTR-A: Measuring Cognitive Recovery Latency in Multi-Agent Systems
Barak Or

TL;DR
This paper introduces MTTR-A, a new metric for measuring how quickly multi-agent systems recover from reasoning failures, providing a quantitative foundation for assessing cognitive dependability.
Contribution
The paper proposes MTTR-A, adapting dependability theory to quantify cognitive recovery latency in MAS, and establishes theoretical bounds linking recovery time to system reliability.
Findings
MTTR-A effectively measures cognitive recovery latency in MAS.
Empirical results show measurable recovery across different reflex strategies.
Theoretical bounds connect recovery latency with long-term cognitive uptime.
Abstract
Reliability in multi-agent systems (MAS) built on large language models is increasingly limited by cognitive failures rather than infrastructure faults. Existing observability tools describe failures but do not quantify how quickly distributed reasoning recovers once coherence is lost. We introduce MTTR-A (Mean Time-to-Recovery for Agentic Systems), a runtime reliability metric that measures cognitive recovery latency in MAS. MTTR-A adapts classical dependability theory to agentic orchestration, capturing the time required to detect reasoning drift and restore coherent operation. We further define complementary metrics, including MTBF and a normalized recovery ratio (NRR), and establish theoretical bounds linking recovery latency to long-run cognitive uptime. Using a LangGraph-based benchmark with simulated drift and reflex recovery, we empirically demonstrate measurable recovery…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification · Constraint Satisfaction and Optimization · AI-based Problem Solving and Planning
