Where Did It Go Wrong? Capability-Oriented Failure Attribution for Vision-and-Language Navigation Agents

Jianming Chen; Yawen Wang; Junjie Wang; Xiaofei Xie; Shoubin Li; Qing Wang; Fanjiang Xu

arXiv:2604.25161·cs.MA·April 29, 2026

Where Did It Go Wrong? Capability-Oriented Failure Attribution for Vision-and-Language Navigation Agents

Jianming Chen, Yawen Wang, Junjie Wang, Xiaofei Xie, Shoubin Li, Qing Wang, Fanjiang Xu

PDF

TL;DR

This paper introduces a capability-oriented testing method for Vision-and-Language Navigation agents that improves failure detection, attribution, and interpretability over existing system-level approaches.

Contribution

It presents a novel adaptive testing framework combining seed mutation, capability oracles, and feedback for precise failure attribution in embodied agents.

Findings

01

Discoveries include more failure cases than baseline methods.

02

Accurately pinpoints capability-specific deficiencies.

03

Provides actionable insights for agent improvement.

Abstract

Embodied agents in safety-critical applications such as Vision-Language Navigation (VLN) rely on multiple interdependent capabilities (e.g., perception, memory, planning, decision), making failures difficult to localize and attribute. Existing testing methods are largely system-level and provide limited insight into which capability deficiencies cause task failures. We propose a capability-oriented testing approach that enables failure detection and attribution by combining (1) adaptive test case generation via seed selection and mutation, (2) capability oracles for identifying capability-specific errors, and (3) a feedback mechanism that attributes failures to capabilities and guides further test generation. Experiments show that our method discovers more failure cases and more accurately pinpoints capability-level deficiencies than state-of-the-art baselines, providing more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.