HTMLPhish: Enabling Phishing Web Page Detection by Applying Deep   Learning Techniques on HTML Analysis

Chidimma Opara; Bo Wei; and Yingke Chen

arXiv:1909.01135·cs.CR·November 9, 2020

HTMLPhish: Enabling Phishing Web Page Detection by Applying Deep Learning Techniques on HTML Analysis

Chidimma Opara, Bo Wei, and Yingke Chen

PDF

TL;DR

HTMLPhish introduces a deep learning approach using CNNs to automatically detect phishing web pages from HTML content, achieving high accuracy and language independence without manual feature engineering.

Contribution

The paper presents HTMLPhish, a novel CNN-based method for phishing detection that leverages HTML content embeddings and manages language variability effectively.

Findings

01

Over 93% accuracy on a large dataset

02

Language-independent detection capability

03

Effective handling of new features through combined embeddings

Abstract

Recently, the development and implementation of phishing attacks require little technical skills and costs. This uprising has led to an ever-growing number of phishing attacks on the World Wide Web. Consequently, proactive techniques to fight phishing attacks have become extremely necessary. In this paper, we propose HTMLPhish, a deep learning based data-driven end-to-end automatic phishing web page classification approach. Specifically, HTMLPhish receives the content of the HTML document of a web page and employs Convolutional Neural Networks (CNNs) to learn the semantic dependencies in the textual contents of the HTML. The CNNs learn appropriate feature representations from the HTML document embeddings without extensive manual feature engineering. Furthermore, our proposed approach of the concatenation of the word and character embeddings allows our model to manage new features and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.