Towards Text-based Phishing Detection

Gilchan Park; Julia M. Taylor

arXiv:2111.01676·cs.CL·November 4, 2021

Towards Text-based Phishing Detection

Gilchan Park, Julia M. Taylor

PDF

Open Access

TL;DR

This study improves text-based phishing detection accuracy using a modified algorithm with readily available resources, highlighting potential for further reduction in false positives with semantic analysis.

Contribution

It presents a modified phishing detection algorithm that outperforms previous non-semantic methods in accuracy, using accessible tools.

Findings

01

Better phishing email recognition accuracy

02

Slightly higher false positive rate

03

Potential for semantic analysis to improve results

Abstract

This paper reports on an experiment into text-based phishing detection using readily available resources and without the use of semantics. The developed algorithm is a modified version of previously published work that works with the same tools. The results obtained in recognizing phishing emails are considerably better than the previously reported work; but the rate of text falsely identified as phishing is slightly worse. It is expected that adding semantic component will reduce the false positive rate while preserving the detection accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Text and Document Classification Technologies · Misinformation and Its Impacts