Randomized Controlled Trials for Phishing Triage Agent
James Bono

TL;DR
This study evaluates the effectiveness of a domain-specific AI phishing triage agent in security operations centers through a randomized controlled trial, showing significant improvements in analyst productivity and accuracy.
Contribution
It is the first RCT assessing an AI agent's impact on phishing email triage, demonstrating substantial gains in efficiency and correctness in a real-world setting.
Findings
Agent-augmented analysts identified 6.5 times more true positives per minute.
There was a 77% increase in verdict accuracy with the AI agent.
Analysts spent 53% more time on malicious emails when assisted by the agent.
Abstract
Security operations centers (SOCs) face a persistent challenge: efficiently triaging a high volume of user-reported phishing emails while maintaining robust protection against threats. This paper presents the first randomized controlled trial (RCT) evaluating the impact of a domain-specific AI agent - the Microsoft Security Copilot Phishing Triage Agent - on analyst productivity and accuracy. Our results demonstrate that agent-augmented analysts achieved up to 6.5 times as many true positives per analyst minute and a 77% improvement in verdict accuracy compared to a control group. The agent's queue prioritization and verdict explanations were both significant drivers of efficiency. Behavioral analysis revealed that agent-augmented analysts reallocated their attention, spending 53% more time on malicious emails, and were not prone to rubber-stamping the agent's malicious verdicts. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Information and Cyber Security · Personal Information Management and User Behavior
