Metric Matters: A Formal Evaluation of Similarity Measures in Active Learning for Cyber Threat Intelligence
Sidahmed Benabderrahmane, Talal Rahwan

TL;DR
This paper evaluates how different similarity measures affect active learning-based anomaly detection in cyber threat intelligence, demonstrating that the choice of metric significantly influences model performance and efficiency.
Contribution
It provides a formal analysis of similarity measures in active learning for cyber threat detection, highlighting their impact on model convergence and detection accuracy.
Findings
Similarity measure choice affects model convergence.
Different metrics influence anomaly detection accuracy.
Optimal similarity functions improve label efficiency.
Abstract
Advanced Persistent Threats (APTs) pose a severe challenge to cyber defense due to their stealthy behavior and the extreme class imbalance inherent in detection datasets. To address these issues, we propose a novel active learning-based anomaly detection framework that leverages similarity search to iteratively refine the decision space. Built upon an Attention-Based Autoencoder, our approach uses feature-space similarity to identify normal-like and anomaly-like instances, thereby enhancing model robustness with minimal oracle supervision. Crucially, we perform a formal evaluation of various similarity measures to understand their influence on sample selection and anomaly ranking effectiveness. Through experiments on diverse datasets, including DARPA Transparent Computing APT traces, we demonstrate that the choice of similarity metric significantly impacts model convergence, anomaly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
