Adaptive alert prioritisation in security operations centres via learning to defer with human feedback
Fatemeh Jalalvand, Mohan Baruwal Chhetri, Surya Nepal, C\'ecile Paris

TL;DR
This paper introduces an adaptive alert prioritisation framework for security operations centres that uses deep reinforcement learning from human feedback to improve decision-making, reduce false positives, and lessen analyst workload.
Contribution
It proposes L2DHF, a novel adaptive learning framework that dynamically incorporates human feedback to optimize alert deferral decisions in SOCs, outperforming static models.
Findings
L2DHF achieves 13-16% higher accuracy on UNSW-NB15.
L2DHF achieves 60-67% higher accuracy on CICIDS2017.
L2DHF reduces high-category alert misprioritisation by 98%.
Abstract
Alert prioritisation (AP) is crucial for security operations centres (SOCs) to manage the overwhelming volume of alerts and ensure timely detection and response to genuine threats, while minimising alert fatigue. Although predictive AI can process large alert volumes and identify known patterns, it struggles with novel and evolving scenarios that demand contextual understanding and nuanced judgement. A promising solution is Human-AI teaming (HAT), which combines human expertise with AI's computational capabilities. Learning to Defer (L2D) operationalises HAT by enabling AI to "defer" uncertain or unfamiliar cases to human experts. However, traditional L2D models rely on static deferral policies that do not evolve with experience, limiting their ability to learn from human feedback and adapt over time. To overcome this, we introduce Learning to Defer with Human Feedback (L2DHF), an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
