Fed-urlBERT: Client-side Lightweight Federated Transformers for URL   Threat Analysis

Yujie Li; Yanbin Wang; Haitao Xu; Zhenhao Guo; Fan Zhang; Ruitong Liu,; Wenrui Ma

arXiv:2312.03636·cs.CR·December 7, 2023·1 cites

Fed-urlBERT: Client-side Lightweight Federated Transformers for URL Threat Analysis

Yujie Li, Yanbin Wang, Haitao Xu, Zhenhao Guo, Fan Zhang, Ruitong Liu,, Wenrui Ma

PDF

Open Access 1 Repo

TL;DR

Fed-urlBERT introduces a lightweight federated transformer model for URL threat detection that preserves privacy, reduces computational and bandwidth costs, and maintains high performance across diverse data scenarios.

Contribution

The paper presents Fed-urlBERT, a novel split learning-based federated transformer model tailored for URL threat analysis, balancing privacy, efficiency, and accuracy.

Findings

01

Achieves comparable performance to centralized models in IID and non-IID scenarios.

02

Reduces false positive rate by approximately 7% compared to centralized models.

03

Demonstrates effective mitigation of client heterogeneity through adaptive local aggregation.

Abstract

In evolving cyber landscapes, the detection of malicious URLs calls for cooperation and knowledge sharing across domains. However, collaboration is often hindered by concerns over privacy and business sensitivities. Federated learning addresses these issues by enabling multi-clients collaboration without direct data exchange. Unfortunately, if highly expressive Transformer models are used, clients may face intolerable computational burdens, and the exchange of weights could quickly deplete network bandwidth. In this paper, we propose Fed-urlBERT, a federated URL pre-trained model designed to address both privacy concerns and the need for cross-domain collaboration in cybersecurity. Fed-urlBERT leverages split learning to divide the pre-training model into client and server part, so that the client part takes up less extensive computation resources and bandwidth. Our appraoch achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

davidup1/fedurlbert
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Network Security and Intrusion Detection · HIV, Drug Use, Sexual Risk