Setting the threshold for high throughput detectors: A mathematical approach for ensembles of dynamic, heterogeneous, probabilistic anomaly detectors
Robert A. Bridges, Jessie D. Jamieson, and Joel W. Reed

TL;DR
This paper introduces a mathematical approach and an algorithm for setting thresholds in ensembles of heterogeneous, adaptive anomaly detectors to control alert rates effectively in high-volume cyber security environments.
Contribution
It provides a rigorous, a priori threshold-setting algorithm for dynamic, heterogeneous detectors and analyzes the impact of data distribution knowledge on its effectiveness.
Findings
Algorithm effectively regulates alert rates in large detector ensembles.
Empirical validation on real network data demonstrates practical utility.
Analysis reveals when model refitting is necessary due to distribution shifts.
Abstract
Anomaly detection (AD) has garnered ample attention in security research, as such algorithms complement existing signature-based methods but promise detection of never-before-seen attacks. Cyber operations manage a high volume of heterogeneous log data; hence, AD in such operations involves multiple (e.g., per IP, per data type) ensembles of detectors modeling heterogeneous characteristics (e.g., rate, size, type) often with adaptive online models producing alerts in near real time. Because of high data volume, setting the threshold for each detector in such a system is an essential yet underdeveloped configuration issue that, if slightly mistuned, can leave the system useless, either producing a myriad of alerts and flooding downstream systems, or giving none. In this work, we build on the foundations of Ferragut et al. to provide a set of rigorous results for understanding the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
