Uncovering Systemic and Environment Errors in Autonomous Systems Using Differential Testing

Yashwanthi Anand; Rahil P Mehta; Manish Motwani; Sandhya Saisubramanian

arXiv:2507.03870·cs.AI·January 16, 2026

Uncovering Systemic and Environment Errors in Autonomous Systems Using Differential Testing

Yashwanthi Anand, Rahil P Mehta, Manish Motwani, Sandhya Saisubramanian

PDF

TL;DR

This paper introduces AIProbe, a black-box differential testing method that distinguishes whether autonomous agent failures are due to systemic errors or environmental infeasibility, improving error attribution accuracy.

Contribution

AIProbe is a novel testing technique that generates diverse environment configurations and uses search-based planning to accurately attribute agent failures to systemic or environmental causes.

Findings

01

AIProbe outperforms existing methods in error detection accuracy.

02

It effectively distinguishes between agent deficiencies and environment infeasibility.

03

Evaluation across multiple domains demonstrates its robustness.

Abstract

When an autonomous agent behaves undesirably, including failure to complete a task, it can be difficult to determine whether the behavior is due to a systemic agent error, such as flaws in the model or policy, or an environment error, where a task is inherently infeasible under a given environment configuration, even for an ideal agent. As agents and their environments grow more complex, identifying the error source becomes increasingly difficult but critical for reliable deployment. We introduce AIProbe, a novel black-box testing technique that applies differential testing to attribute undesirable agent behaviors either to agent deficiencies, such as modeling or training flaws, or due to environmental infeasibility. AIProbe first generates diverse environmental configurations and tasks for testing the agent, by modifying configurable parameters using Latin Hypercube sampling. It then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.