Characterizing Phishing Threats with Natural Language Processing
Michael C. Kotson, Alexia Schulz

TL;DR
This paper uses NLP techniques to analyze a real-world spear phishing campaign, demonstrating how semantic similarity and clustering can identify targeted attacks and differentiate them from random spam.
Contribution
It introduces a method to quantify and characterize spear phishing attacks using NLP, focusing on semantic similarity and topical clustering of email content.
Findings
High statistical evidence (p < 10^{-4}) of targeted content in phishing emails.
Targeted recipients received topically clustered CVs.
The campaign specifically targeted certain demographics within the institution.
Abstract
Spear phishing is a widespread concern in the modern network security landscape, but there are few metrics that measure the extent to which reconnaissance is performed on phishing targets. Spear phishing emails closely match the expectations of the recipient, based on details of their experiences and interests, making them a popular propagation vector for harmful malware. In this work we use Natural Language Processing techniques to investigate a specific real-world phishing campaign and quantify attributes that indicate a targeted spear phishing attack. Our phishing campaign data sample comprises 596 emails - all containing a web bug and a Curriculum Vitae (CV) PDF attachment - sent to our institution by a foreign IP space. The campaign was found to exclusively target specific demographics within our institution. Performing a semantic similarity analysis between the senders' CV…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Authorship Attribution and Profiling · Misinformation and Its Impacts
