AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems

Boxuan Zhang; Jianing Zhu; Zeru Shi; Dongfang Liu; Ruixiang Tang

arXiv:2605.08715·cs.CL·May 15, 2026

AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems

Boxuan Zhang, Jianing Zhu, Zeru Shi, Dongfang Liu, Ruixiang Tang

PDF

2 Repos 1 Datasets

TL;DR

AgentForesight introduces an online auditing framework for early failure detection in multi-agent systems, enabling intervention during trajectory unfolding rather than post-hoc diagnosis.

Contribution

This work presents a novel online auditing approach with a curated dataset and a reinforcement learning-trained auditor outperforming existing models in early failure prediction.

Findings

01

AgentForesight-7B achieves up to +19.9% performance gain.

02

It reduces step localization error by 3×.

03

Outperforms GPT-4.1 and DeepSeek-V4-Pro in benchmarks.

Abstract

LLM-based multi-agent systems are increasingly deployed on long-horizon tasks, but a single decisive error is often accepted by downstream agents and cascades into trajectory-level failure. Existing work frames this as \emph{post-hoc failure attribution}, diagnosing the responsible agent and step after the trajectory has ended. However, this paradigm forfeits any opportunity to intervene while trajectory is still unfolding. In this work, we introduce AgentForesight, a framework that reframes this problem as online auditing: at each step of an unfolding trajectory, an auditor observes only the current prefix and must either continue the run or alarm at the earliest decisive error, without access to future steps. To this end, we curate AFTraj-2K, a corpus of agentic trajectories across Coding, Math, and Agentic domains, in which safe trajectories are retained under a strict curation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

ZBox008003/AFTraj
dataset· 200 dl
200 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.