Evaluation of Federated Learning in Phishing Email Detection

Chandra Thapa; Jun Wen Tang; Alsharif Abuadbba; Yansong Gao; Seyit; Camtepe; Surya Nepal; Mahathir Almashor; Yifeng Zheng

arXiv:2007.13300·cs.LG·May 24, 2021·5 cites

Evaluation of Federated Learning in Phishing Email Detection

Chandra Thapa, Jun Wen Tang, Alsharif Abuadbba, Yansong Gao, Seyit, Camtepe, Surya Nepal, Mahathir Almashor, Yifeng Zheng

PDF

Open Access

TL;DR

This paper investigates the effectiveness of federated learning for phishing email detection using deep neural networks, analyzing performance under various data distributions and organizational settings, and highlighting its potential and limitations.

Contribution

It is the first study to evaluate federated learning in email anti-phishing, comparing its performance to centralized models and analyzing effects of data distribution and organizational count.

Findings

01

Federated learning achieves comparable performance to centralized models on balanced datasets.

02

Increasing organizational count can decrease accuracy for RNN but improve BERT performance.

03

FL convergence speed improves with more organizations and data, but performance suffers with highly asymmetric data distributions.

Abstract

The use of Artificial Intelligence (AI) to detect phishing emails is primarily dependent on large-scale centralized datasets, which opens it up to a myriad of privacy, trust, and legal issues. Moreover, organizations are loathed to share emails, given the risk of leakage of commercially sensitive information. So, it is uncommon to obtain sufficient emails to train a global AI model efficiently. Accordingly, privacy-preserving distributed and collaborative machine learning, particularly Federated Learning (FL), is a desideratum. Already prevalent in the healthcare sector, questions remain regarding the effectiveness and efficacy of FL-based phishing detection within the context of multi-organization collaborations. To the best of our knowledge, the work herein is the first to investigate the use of FL in email anti-phishing. This paper builds upon a deep neural network model,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Privacy-Preserving Technologies in Data · Cooperative Communication and Network Coding

MethodsAttention Is All You Need · Linear Layer · Softmax · Linear Warmup With Linear Decay · Layer Normalization · WordPiece · Attention Dropout · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Weight Decay