A Survey on Failure Analysis and Fault Injection in AI Systems
Guangba Yu, Gou Tan, Haojia Huang, Zhenyu Zhang, Pengfei Chen, Roberto Natella, Zibin Zheng

TL;DR
This survey reviews failure analysis and fault injection methods in AI systems, especially focusing on Large Language Models, to understand vulnerabilities, evaluate current tools, and identify gaps between simulated and real failures.
Contribution
It provides a comprehensive taxonomy of AI failures, evaluates existing fault injection tools, and highlights gaps to improve AI system resilience.
Findings
Identifies prevalent failure types in AI systems
Assesses capabilities of current fault injection tools
Highlights discrepancies between real-world failures and simulations
Abstract
The rapid advancement of Artificial Intelligence (AI) has led to its integration into various areas, especially with Large Language Models (LLMs) significantly enhancing capabilities in Artificial Intelligence Generated Content (AIGC). However, the complexity of AI systems has also exposed their vulnerabilities, necessitating robust methods for failure analysis (FA) and fault injection (FI) to ensure resilience and reliability. Despite the importance of these techniques, there lacks a comprehensive review of FA and FI methodologies in AI systems. This study fills this gap by presenting a detailed survey of existing FA and FI approaches across six layers of AI systems. We systematically analyze 160 papers and repositories to answer three research questions including (1) what are the prevalent failures in AI systems, (2) what types of faults can current FI tools simulate, (3) what gaps…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Fault Detection and Control Systems · Software Testing and Debugging Techniques
MethodsFeedback Alignment
