Uncovering Semantics and Topics Utilized by Threat Actors to Deliver   Malicious Attachments and URLs

Andrey Yakymovych; Abhishek Singh

arXiv:2407.08888·cs.LG·July 15, 2024

Uncovering Semantics and Topics Utilized by Threat Actors to Deliver Malicious Attachments and URLs

Andrey Yakymovych, Abhishek Singh

PDF

Open Access

TL;DR

This paper uses advanced unsupervised topic modeling and semantic analysis to uncover common themes and semantics in malicious emails, enhancing threat detection capabilities.

Contribution

It introduces a novel application of multilingual embedding models and clustering algorithms for semantic analysis of malicious email content, revealing threat actor patterns.

Findings

01

Identifies common semantics in malicious emails

02

Compares clustering algorithms for effectiveness

03

Provides insights into threat actor themes

Abstract

Recent threat reports highlight that email remains the top vector for delivering malware to endpoints. Despite these statistics, detecting malicious email attachments and URLs often neglects semantic cues linguistic features and contextual clues. Our study employs BERTopic unsupervised topic modeling to identify common semantics and themes embedded in email to deliver malicious attachments and call-to-action URLs. We preprocess emails by extracting and sanitizing content and employ multilingual embedding models like BGE-M3 for dense representations, which clustering algorithms(HDBSCAN and OPTICS) use to group emails by semantic similarity. Phi3-Mini-4K-Instruct facilitates semantic and hLDA aid in thematic analysis to understand threat actor patterns. Our research will evaluate and compare different clustering algorithms on topic quantity, coherence, and diversity metrics, concluding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Cybercrime and Law Enforcement Studies · Information and Cyber Security