Man versus Machine: AutoML and Human Experts' Role in Phishing Detection
Rizka Purwanto, Arindam Pal, Alan Blair, Sanjay Jha

TL;DR
This paper evaluates whether AutoML frameworks can outperform human-designed models in phishing detection, finding AutoML excels in complex, non-linear datasets but still requires expert input for real-world applications.
Contribution
It provides a comparative analysis of AutoML frameworks versus human-crafted models across multiple phishing datasets, highlighting strengths and limitations of AutoML in cybersecurity.
Findings
AutoML outperforms manual models in complex, non-linear datasets.
AutoML struggles with incremental learning and unlabeled data in real-world scenarios.
Human expertise remains crucial for effective phishing detection systems.
Abstract
Machine learning (ML) has developed rapidly in the past few years and has successfully been utilized for a broad range of tasks, including phishing detection. However, building an effective ML-based detection system is not a trivial task, and requires data scientists with knowledge of the relevant domain. Automated Machine Learning (AutoML) frameworks have received a lot of attention in recent years, enabling non-ML experts in building a machine learning model. This brings to an intriguing question of whether AutoML can outperform the results achieved by human data scientists. Our paper compares the performances of six well-known, state-of-the-art AutoML frameworks on ten different phishing datasets to see whether AutoML-based models can outperform manually crafted machine learning models. Our results indicate that AutoML-based models are able to outperform manually developed machine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Imbalanced Data Classification Techniques · Data Stream Mining Techniques
