Detecting Silent Failures in Multi-Agentic AI Trajectories

Divya Pathak; Harshit Kumar; Anuska Roy; Felix George; Mudit Verma; Pratibha Moogi

arXiv:2511.04032·cs.AI·November 7, 2025

Detecting Silent Failures in Multi-Agentic AI Trajectories

Divya Pathak, Harshit Kumar, Anuska Roy, Felix George, Mudit Verma, Pratibha Moogi

PDF

Open Access

TL;DR

This paper addresses the challenge of detecting silent failures in multi-agent AI systems powered by large language models, introducing datasets and benchmarks for anomaly detection to improve system reliability.

Contribution

It presents the first systematic study with curated datasets, benchmarks, and analysis of anomaly detection methods for multi-agent AI trajectories.

Findings

01

Supervised (XGBoost) and semi-supervised (SVDD) methods achieve up to 98% accuracy.

02

Curated datasets contain 4,275 and 894 trajectories for benchmarking.

03

Provides insights and tools for future research in anomaly detection.

Abstract

Multi-Agentic AI systems, powered by large language models (LLMs), are inherently non-deterministic and prone to silent failures such as drift, cycles, and missing details in outputs, which are difficult to detect. We introduce the task of anomaly detection in agentic trajectories to identify these failures and present a dataset curation pipeline that captures user behavior, agent non-determinism, and LLM variation. Using this pipeline, we curate and label two benchmark datasets comprising \textbf{4,275 and 894} trajectories from Multi-Agentic AI systems. Benchmarking anomaly detection methods on these datasets, we show that supervised (XGBoost) and semi-supervised (SVDD) approaches perform comparably, achieving accuracies up to 98% and 96%, respectively. This work provides the first systematic study of anomaly detection in Multi-Agentic AI systems, offering datasets, benchmarks, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)