Enhance the machine learning algorithm performance in phishing detection with keyword features
Zijiang Yang

TL;DR
This paper introduces a novel keyword feature integration method that significantly improves machine learning-based phishing URL detection accuracy, reducing errors by 30% and achieving up to 99.68% accuracy without relying on third-party data.
Contribution
The paper proposes a new keyword feature incorporation technique that enhances traditional machine learning algorithms for phishing detection, especially effective on small datasets.
Findings
Reduces classification error by 30% on large datasets
Achieves up to 99.68% accuracy with the proposed method
More effective on small datasets
Abstract
Recently, we can observe a significant increase of the phishing attacks in the Internet. In a typical phishing attack, the attacker sets up a malicious website that looks similar to the legitimate website in order to obtain the end-users' information. This may cause the leakage of the sensitive information and the financial loss for the end-users. To avoid such attacks, the early detection of these websites' URLs is vital and necessary. Previous researchers have proposed many machine learning algorithms to distinguish the phishing URLs from the legitimate ones. In this paper, we would like to enhance these machine learning algorithms from the perspective of feature selection. We propose a novel method to incorporate the keyword features with the traditional features. This method is applied on multiple traditional machine learning algorithms and the experimental results have shown this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
