Metric Matters: A Formal Evaluation of Similarity Measures in Active Learning for Cyber Threat Intelligence

Sidahmed Benabderrahmane; Talal Rahwan

arXiv:2508.19019·cs.LG·August 27, 2025

Metric Matters: A Formal Evaluation of Similarity Measures in Active Learning for Cyber Threat Intelligence

Sidahmed Benabderrahmane, Talal Rahwan

PDF

TL;DR

This paper evaluates how different similarity measures affect active learning-based anomaly detection in cyber threat intelligence, demonstrating that the choice of metric significantly influences model performance and efficiency.

Contribution

It provides a formal analysis of similarity measures in active learning for cyber threat detection, highlighting their impact on model convergence and detection accuracy.

Findings

01

Similarity measure choice affects model convergence.

02

Different metrics influence anomaly detection accuracy.

03

Optimal similarity functions improve label efficiency.

Abstract

Advanced Persistent Threats (APTs) pose a severe challenge to cyber defense due to their stealthy behavior and the extreme class imbalance inherent in detection datasets. To address these issues, we propose a novel active learning-based anomaly detection framework that leverages similarity search to iteratively refine the decision space. Built upon an Attention-Based Autoencoder, our approach uses feature-space similarity to identify normal-like and anomaly-like instances, thereby enhancing model robustness with minimal oracle supervision. Crucially, we perform a formal evaluation of various similarity measures to understand their influence on sample selection and anomaly ranking effectiveness. Through experiments on diverse datasets, including DARPA Transparent Computing APT traces, we demonstrate that the choice of similarity metric significantly impacts model convergence, anomaly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.