TL;DR
This paper analyzes silent AI failures in human-AI interactions, categorizing them into archetypes, and demonstrates their relevance across different AI capabilities and domains.
Contribution
It introduces a taxonomy of invisible AI failures, analyzes their patterns, and provides a dataset and code for failure monitoring in AI systems.
Findings
79% of AI failures are invisible to users
Failure rates have decreased with newer models but invisibility persists
Failure archetypes remain stable across AI capabilities
Abstract
AI systems fail silently far more often than they fail visibly. In an analysis of 100K human-AI interactions from the WildChat dataset, we find that 79% of AI failures are invisible: something went wrong but the user gave no overt indication that there was a problem. These invisible failures cluster into eight archetypes that help us characterize where and how AI systems are failing to meet users' needs. In addition, the archetypes show systematic co-occurrence patterns indicating higher-level failure types. To address the question of whether these archetypes will remain relevant as AI systems become more capable, we also created and annotated a counterfactual dataset in which WildChat's 2024-era responses are replaced by those from three present-day frontier LMs. This analysis indicates that failure rates have dropped substantially, but that the vast majority of failures remain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
