Look Before You Leap: Detecting Phishing Web Pages by Exploiting Raw URL   And HTML Characteristics

Chidimma Opara; Yingke Chen; Bo.wei

arXiv:2011.04412·cs.CR·September 6, 2023

Look Before You Leap: Detecting Phishing Web Pages by Exploiting Raw URL And HTML Characteristics

Chidimma Opara, Yingke Chen, Bo.wei

PDF

TL;DR

This paper introduces WebPhish, a deep learning model that detects phishing websites by analyzing raw URL and HTML content, achieving high accuracy without manual feature engineering.

Contribution

It presents an end-to-end neural network that automatically learns features from raw URL and HTML data, overcoming limitations of traditional handcrafted feature-based methods.

Findings

01

Achieved 98.1% accuracy in phishing detection

02

Outperformed baseline approaches in experiments

03

Effectively models semantic dependencies in URL and HTML content

Abstract

Phishing websites distribute unsolicited content and are frequently used to commit email and internet fraud; detecting them before any user information is submitted is critical. Several efforts have been made to detect these phishing websites in recent years. Most existing approaches use hand-crafted lexical and statistical features from a website's textual content to train classification models to detect phishing web pages. However, these phishing detection approaches have a few challenges, including 1) the tediousness of extracting hand-crafted features, which require specialized domain knowledge to determine which features are useful for a particular platform; and 2) the difficulties encountered by models built on hand-crafted features to capture the semantic patterns in words and characters in URL and HTML content. To address these challenges, this paper proposes WebPhish, an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.