Unsupervised Learning of Distributional Properties can Supplement Human Labeling and Increase Active Learning Efficiency in Anomaly Detection
Jaturong Kongmanee, Mark Chignell, Khilan Jerath, Abhay Raman

TL;DR
This paper introduces an adaptive active learning strategy that combines distributional properties and model uncertainty to efficiently identify rare anomalies, improving labeling efficiency in cybersecurity data exfiltration detection.
Contribution
It presents a novel active learning method that leverages unsupervised anomaly detection and data distribution to select more informative samples, enhancing anomaly detection performance.
Findings
Outperforms existing active learning methods on benchmark datasets
Early-stage unsupervised anomaly detection improves classifier training
Effective in highly unbalanced real-world email data
Abstract
Exfiltration of data via email is a serious cybersecurity threat for many organizations. Detecting data exfiltration (anomaly) patterns typically requires labeling, most often done by a human annotator, to reduce the high number of false alarms. Active Learning (AL) is a promising approach for labeling data efficiently, but it needs to choose an efficient order in which cases are to be labeled, and there are uncertainties as to what scoring procedure should be used to prioritize cases for labeling, especially when detecting rare cases of interest is crucial. We propose an adaptive AL sampling strategy that leverages the underlying prior data distribution, as well as model uncertainty, to produce batches of cases to be labeled that contain instances of rare anomalies. We show that (1) the classifier benefits from a batch of representative and informative instances of both normal and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Spam and Phishing Detection · Anomaly Detection Techniques and Applications
