Characterizing Phishing Threats with Natural Language Processing

Michael C. Kotson; Alexia Schulz

arXiv:1508.07885·cs.CR·October 30, 2024

Characterizing Phishing Threats with Natural Language Processing

Michael C. Kotson, Alexia Schulz

PDF

Open Access

TL;DR

This paper uses NLP techniques to analyze a real-world spear phishing campaign, demonstrating how semantic similarity and clustering can identify targeted attacks and differentiate them from random spam.

Contribution

It introduces a method to quantify and characterize spear phishing attacks using NLP, focusing on semantic similarity and topical clustering of email content.

Findings

01

High statistical evidence (p < 10^{-4}) of targeted content in phishing emails.

02

Targeted recipients received topically clustered CVs.

03

The campaign specifically targeted certain demographics within the institution.

Abstract

Spear phishing is a widespread concern in the modern network security landscape, but there are few metrics that measure the extent to which reconnaissance is performed on phishing targets. Spear phishing emails closely match the expectations of the recipient, based on details of their experiences and interests, making them a popular propagation vector for harmful malware. In this work we use Natural Language Processing techniques to investigate a specific real-world phishing campaign and quantify attributes that indicate a targeted spear phishing attack. Our phishing campaign data sample comprises 596 emails - all containing a web bug and a Curriculum Vitae (CV) PDF attachment - sent to our institution by a foreign IP space. The campaign was found to exclusively target specific demographics within our institution. Performing a semantic similarity analysis between the senders' CV…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Authorship Attribution and Profiling · Misinformation and Its Impacts