Unsupervised Learning of Distributional Properties can Supplement Human   Labeling and Increase Active Learning Efficiency in Anomaly Detection

Jaturong Kongmanee; Mark Chignell; Khilan Jerath; Abhay Raman

arXiv:2307.08782·cs.LG·July 19, 2023·1 cites

Unsupervised Learning of Distributional Properties can Supplement Human Labeling and Increase Active Learning Efficiency in Anomaly Detection

Jaturong Kongmanee, Mark Chignell, Khilan Jerath, Abhay Raman

PDF

Open Access

TL;DR

This paper introduces an adaptive active learning strategy that combines distributional properties and model uncertainty to efficiently identify rare anomalies, improving labeling efficiency in cybersecurity data exfiltration detection.

Contribution

It presents a novel active learning method that leverages unsupervised anomaly detection and data distribution to select more informative samples, enhancing anomaly detection performance.

Findings

01

Outperforms existing active learning methods on benchmark datasets

02

Early-stage unsupervised anomaly detection improves classifier training

03

Effective in highly unbalanced real-world email data

Abstract

Exfiltration of data via email is a serious cybersecurity threat for many organizations. Detecting data exfiltration (anomaly) patterns typically requires labeling, most often done by a human annotator, to reduce the high number of false alarms. Active Learning (AL) is a promising approach for labeling data efficiently, but it needs to choose an efficient order in which cases are to be labeled, and there are uncertainties as to what scoring procedure should be used to prioritize cases for labeling, especially when detecting rare cases of interest is crucial. We propose an adaptive AL sampling strategy that leverages the underlying prior data distribution, as well as model uncertainty, to produce batches of cases to be labeled that contain instances of rare anomalies. We show that (1) the classifier benefits from a batch of representative and informative instances of both normal and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Spam and Phishing Detection · Anomaly Detection Techniques and Applications