Evaluation of Federated Learning in Phishing Email Detection
Chandra Thapa, Jun Wen Tang, Alsharif Abuadbba, Yansong Gao, Seyit, Camtepe, Surya Nepal, Mahathir Almashor, Yifeng Zheng

TL;DR
This paper investigates the effectiveness of federated learning for phishing email detection using deep neural networks, analyzing performance under various data distributions and organizational settings, and highlighting its potential and limitations.
Contribution
It is the first study to evaluate federated learning in email anti-phishing, comparing its performance to centralized models and analyzing effects of data distribution and organizational count.
Findings
Federated learning achieves comparable performance to centralized models on balanced datasets.
Increasing organizational count can decrease accuracy for RNN but improve BERT performance.
FL convergence speed improves with more organizations and data, but performance suffers with highly asymmetric data distributions.
Abstract
The use of Artificial Intelligence (AI) to detect phishing emails is primarily dependent on large-scale centralized datasets, which opens it up to a myriad of privacy, trust, and legal issues. Moreover, organizations are loathed to share emails, given the risk of leakage of commercially sensitive information. So, it is uncommon to obtain sufficient emails to train a global AI model efficiently. Accordingly, privacy-preserving distributed and collaborative machine learning, particularly Federated Learning (FL), is a desideratum. Already prevalent in the healthcare sector, questions remain regarding the effectiveness and efficacy of FL-based phishing detection within the context of multi-organization collaborations. To the best of our knowledge, the work herein is the first to investigate the use of FL in email anti-phishing. This paper builds upon a deep neural network model,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Privacy-Preserving Technologies in Data · Cooperative Communication and Network Coding
MethodsAttention Is All You Need · Linear Layer · Softmax · Linear Warmup With Linear Decay · Layer Normalization · WordPiece · Attention Dropout · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Weight Decay
