Enhance the machine learning algorithm performance in phishing detection with keyword features

Zijiang Yang

arXiv:2508.09765·cs.CR·August 14, 2025

Enhance the machine learning algorithm performance in phishing detection with keyword features

Zijiang Yang

PDF

TL;DR

This paper introduces a novel keyword feature integration method that significantly improves machine learning-based phishing URL detection accuracy, reducing errors by 30% and achieving up to 99.68% accuracy without relying on third-party data.

Contribution

The paper proposes a new keyword feature incorporation technique that enhances traditional machine learning algorithms for phishing detection, especially effective on small datasets.

Findings

01

Reduces classification error by 30% on large datasets

02

Achieves up to 99.68% accuracy with the proposed method

03

More effective on small datasets

Abstract

Recently, we can observe a significant increase of the phishing attacks in the Internet. In a typical phishing attack, the attacker sets up a malicious website that looks similar to the legitimate website in order to obtain the end-users' information. This may cause the leakage of the sensitive information and the financial loss for the end-users. To avoid such attacks, the early detection of these websites' URLs is vital and necessary. Previous researchers have proposed many machine learning algorithms to distinguish the phishing URLs from the legitimate ones. In this paper, we would like to enhance these machine learning algorithms from the perspective of feature selection. We propose a novel method to incorporate the keyword features with the traditional features. This method is applied on multiple traditional machine learning algorithms and the experimental results have shown this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.