URLTran: Improving Phishing URL Detection Using Transformers

Pranav Maneriker; Jack W. Stokes; Edir Garcia Lazo; Diana Carutasu,; Farid Tajaddodianfar; Arun Gururajan

arXiv:2106.05256·cs.CR·August 30, 2021

URLTran: Improving Phishing URL Detection Using Transformers

Pranav Maneriker, Jack W. Stokes, Edir Garcia Lazo, Diana Carutasu,, Farid Tajaddodianfar, Arun Gururajan

PDF

TL;DR

URLTran leverages transformer models to significantly enhance phishing URL detection accuracy, especially at very low false positive rates, and demonstrates robustness against common adversarial attacks.

Contribution

The paper introduces URLTran, a transformer-based model that outperforms existing deep learning methods in phishing URL detection and improves robustness against adversarial black-box attacks.

Findings

01

URLTran achieves an 86.80% TPR at 0.01% FPR, outperforming baselines.

02

Transformer models with domain-specific pre-training improve detection accuracy.

03

URLTran maintains low FPR under adversarial homoglyph and compound word attacks.

Abstract

Browsers often include security features to detect phishing web pages. In the past, some browsers evaluated an unknown URL for inclusion in a list of known phishing pages. However, as the number of URLs and known phishing pages continued to increase at a rapid pace, browsers started to include one or more machine learning classifiers as part of their security services that aim to better protect end users from harm. While additional information could be used, browsers typically evaluate every unknown URL using some classifier in order to quickly detect these phishing pages. Early phishing detection used standard machine learning classifiers, but recent research has instead proposed the use of deep learning models for the phishing URL detection task. Concurrently, text embedding research using transformers has led to state-of-the-art results in many natural language processing tasks. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.