URLNet: Learning a URL Representation with Deep Learning for Malicious   URL Detection

Hung Le; Quang Pham; Doyen Sahoo; Steven C.H. Hoi

arXiv:1802.03162·cs.CR·March 5, 2018·193 cites

URLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection

Hung Le, Quang Pham, Doyen Sahoo, Steven C.H. Hoi

PDF

Open Access 3 Repos

TL;DR

URLNet is a deep learning framework that automatically learns URL representations for malicious URL detection, overcoming limitations of manual feature engineering and capturing semantic patterns in URLs.

Contribution

It introduces an end-to-end CNN-based model that learns URL embeddings directly from characters and words, improving detection accuracy over traditional methods.

Findings

01

Significant performance improvement over existing methods

02

Effective capture of semantic information in URLs

03

Robustness to unseen URL features

Abstract

Malicious URLs host unsolicited content and are used to perpetrate cybercrimes. It is imperative to detect them in a timely manner. Traditionally, this is done through the usage of blacklists, which cannot be exhaustive, and cannot detect newly generated malicious URLs. To address this, recent years have witnessed several efforts to perform Malicious URL Detection using Machine Learning. The most popular and scalable approaches use lexical properties of the URL string by extracting Bag-of-words like features, followed by applying machine learning models such as SVMs. There are also other features designed by experts to improve the prediction performance of the model. These approaches suffer from several limitations: (i) Inability to effectively capture semantic meaning and sequential patterns in URL strings; (ii) Requiring substantial manual feature engineering; and (iii) Inability to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Misinformation and Its Impacts · Advanced Malware Detection Techniques