Phishing Attacks and Websites Classification Using Machine Learning and Multiple Datasets (A Comparative Analysis)
Sohail Ahmed Khan, Wasiq Khan, Abir Hussain

TL;DR
This paper compares various machine learning algorithms for phishing website detection across multiple datasets, highlighting the effectiveness of random forest and neural networks with over 97% accuracy.
Contribution
It provides a comprehensive analysis of different ML algorithms and feature importance for phishing detection, with a focus on performance across diverse datasets.
Findings
Random forest and neural networks outperform other algorithms.
Achieved over 97% accuracy in phishing classification.
Feature selection improves model performance.
Abstract
Phishing attacks are the most common type of cyber-attacks used to obtain sensitive information and have been affecting individuals as well as organisations across the globe. Various techniques have been proposed to identify the phishing attacks specifically, deployment of machine intelligence in recent years. However, the deployed algorithms and discriminating factors are very diverse in existing works. In this study, we present a comprehensive analysis of various machine learning algorithms to evaluate their performances over multiple datasets. We further investigate the most significant features within multiple datasets and compare the classification performance with the reduced dimensional datasets. The statistical results indicate that random forest and artificial neural network outperform other classification algorithms, achieving over 97% accuracy using the identified features.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
