Incident Analysis for AI Agents
Carson Ezell, Xavier Roberts-Gaal, Alan Chan

TL;DR
This paper introduces an incident analysis framework for AI agents, categorizing causes into system, contextual, and cognitive factors, and recommends essential information for incident reports to improve understanding and prevention of harm.
Contribution
It proposes a structured incident analysis framework for AI agents, adapting systems safety approaches to identify causes and guide incident reporting and investigation.
Findings
Framework categorizes incident causes into three types.
Recommends specific information for incident reports.
Provides guidelines for data retention and sharing.
Abstract
As AI agents become more widely deployed, we are likely to see an increasing number of incidents: events involving AI agent use that directly or indirectly cause harm. For example, agents could be prompt-injected to exfiltrate private information or make unauthorized purchases. Structured information about such incidents (e.g., user prompts) can help us understand their causes and prevent future occurrences. However, existing incident reporting processes are not sufficient for understanding agent incidents. In particular, such processes are largely based on publicly available data, which excludes useful, but potentially sensitive, information such as an agent's chain of thought or browser history. To inform the development of new, emerging incident reporting processes, we propose an incident analysis framework for agents. Drawing on systems safety approaches, our framework proposes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Risk and Safety Analysis
